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■ - - . Abstract. We discuss three maior classes of theorems in additive and extremal com- 

OO , 

, binatorics: decomposition theorems, approximate structure theorems, and transference 

' principles. We also show how the finite-dimensional Hahn-Banach theorem can be used 

^ ' to give short and transparent proofs of many results of these kinds. Amongst the appli- 

Q ' cations of this method is a much shorter proof of one of the major steps in the proof of 

' Green and Tao that the primes contain arbitrarily long arithmetic progressions. In order 

Q>^ i to explain the role of this step, we include a brief description of the rest of their argument. 

A similar proof has been discovered independently by Rcingold, Trevisan, Tulsiani and 
Vadhan |RTTV) . 

O 
U 

! Contents 
. 

1. Introduction 2 

2. Some basic concepts of additive combinatorics. 4 

2.1. Preliminaries: Fourier transforms and Lp-norms. 4 

2.2. What is additive combinatorics about? 7 

2.3. Uniformity norms for subsets of finite Abelian groups. IS 
^ I 2.4. Uniformity norms for graphs and hypergraphs. 11 

2.5. Easy structure theorems for the L/^-norm. 13 

OO , 2.6. Inverse theorems. 14 

O ■ ~ 

2.7. Higher Fourier analysis. 16 

■ 2.8. Easy structure theorems for graphs. 17 

H ' 3. The Hahn-Banach theorem and simple applications. 19 

3.1. A simple structure theorem. 21 

3.2. Deducing decomposition theorems from inverse theorems. 22 
4. The positivity and boundedness problems. 27 

4.1. Algebra norms, polynomial approximation and a first transference theorem. 27 

4.2. Approximate duality and algebra-like structures. 30 

4.3. A generalization of the Green- Tao- Ziegler transference theorem. |32 



> 

m 
o 



2 



W.T. GOWERS 



4.4. Arithmetic progressions in the primes. 
5. Tao's structure theorem. 

5.1. A proof of the structure theorem. 

5.2. Decomposition theorems with bounds on ranges 

5.3. Applying Tao's structure theorem 
References 



35 
37 
39 
44 
45 
47 



1. Introduction 



This paper has several purposes. One is to provide a survey of some of the major 
recent developments in the rapidly growing field that has come to be known as additive 
combinatorics, focusing on three classes of theorems: decomposition theorems, approximate 
structural theorems and transference principles. (An explanation of these phrases will 
be given in just a moment.) A second is to show how the Hahn-Banach theorem leads 
to a simple and flexible method for proving results of these three kinds. A third is to 
demonstrate this by actually giving simpler proofs of several important results, or parts of 
results. One of the proofs we shall simplify is the proof of Green and Tao that the primes 
contain arbitrarily long arithmetic progressions |GTlj . which leads to the fourth purpose of 
this paper: to provide a partial guide to their paper. We shall give a simple proof of a result 
that is implicit in their paper, and made explicit in a later paper of Tao and Ziegler, and 
then we shall explain informally how they use this result to prove their famous theorem. A 
proof along similar lines has been discovered independently by Reingold, Trevisan, Tulsiani 
and Vadhan |RTT Vj . We have tried to design this paper so that the reader who is just 
interested in the Green- Tao theorem can get away with reading only a small part of it. 
However, the earlier sections of the paper provide considerable motivation for the later 
arguments, so such a reader would be well-advised at least to skim the sections that are 
not strictly speaking necessary. 

Now let us describe the classes of theorems that will principally concern us. By a 
decomposition theorem we mean a statement that tells us that a function / with certain 
properties can be decomposed as a sum XliLiS'*' where the functions gi have certain other 
properties. There are two kinds of decomposition theorem that have been particularly 
useful. One kind says that / can be written as Ylt=i 9i + where the functions have 
some explicit description and h, the "error term" is in a useful sense small. 
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Another kind, which is closely related, brings us to our second class of results. An 
approximate structure theorem is a result that says that, under appropriate conditions, 
we can write a function / as /i + /2, where /i is "structured" in some sense, and /2 is 
"quasirandom" . The rough idea is that the structure of /i is strong enough for us to be 
able to analyse it reasonably explicitly, and the quasirandomness of /2 is strong enough 
for many properties of f\ to be unaffected if we "perturb" it to /i + /2- Often, in order to 
obtain stronger statements about the structure and the quasirandomness, one allows also 
a small L2-error: that is, one writes / as /i + /2 + /s with fi structured, /2 quasirandom, 
and /s small in L2. 

A transference principle is a statement to the effect that a function / that belongs to 
some space X of functions can be approximated by a function g that belongs to another 
space Y. Such a statement is useful if the functions in Y are easier to handle than the 
functions in X and the approximation is of a kind that preserves the properties that one 
is interested in. As we shall see later, a transference principle is the fundamental step in 
the proof of Green and Tao. For now, let us merely note that a transference principle is a 
particular kind of decomposition theorem: it tells us that / can be written as g + h, where 
g &Y and h is small in an appropriate sense. 

As well as the Green- Tao theorem, we shall discuss several other results in additive 
combinatorics. One is a structure theorem proved by Tao in an important paper [Tl] that 
gives a discretization of Furstenberg's ergodic-theory proof [Fuj of Szemeredi's theorem 
[Slj . or more precisely a somewhat different ergodic-theory argument due to Host and Kra 
[HK05j . We shall give an alternative proof of (a slight generalization of) this theorem, and 
give some idea of how it can be used to prove other results. Amongst these other results 
are Roth's theorem [Rot] . which states that every set of integers of positive upper density 
contains an arithmetic progression of length 3, and Szemeredi's regularity lemma |S2] . a 
cornerstone of extremal graph theory, which shows that every graph can be approximated 
by a disjoint union of boundedly many quasirandom graphs (and which is a very good 
example of an approximate structure theorem). 

The remaining sections of this paper are organized as follows. The next section in- 
troduces several norms that are used to define quasirandomness. Strictly speaking, it is 
independent of much of the rest of the paper, since many of our results will be rather general 
ones about norms that satisfy various hypotheses. However, for the reader who is unfa- 
miliar with the basic concepts of additive combinatorics it may not be obvious that these 
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hypotheses are satisfied except in one or two very special cases: section 2 should convince 
such a reader that the general results can be applied in many interesting contexts. 

In section 3, we introduce our main tool, the finite-dimensional Hahn-Banach theorem, 
and we give one or two very easy consequences of it. Even these consequences are of 
interest, as we shall explain — one of them is a non-trivial decomposition theorem of the 
first kind discussed above — but the method comes into its own when we introduce one or 
two further ideas in order to obtain conclusions that can be applied much more widely. 

One of these ideas is the relatively standard one of polynomial approximations. Often 
we start with a function / that takes values in an interval [a, b], and we want its structured 
part /i to take values in [a, b] as well. If /i is bounded, and if the class of structured 
functions is closed under composition with polynomials, then we can sometimes achieve 
this by choosing a polynomial P such that P{x) approximates a when x < a, x when 
a < X < b, and b when x > b. Then the function Pfi takes values in [a, b] (approximately), 
and under appropriate circumstances it is possible to argue that it approximates /i. In 
section 4, we shall illustrate this technique by proving two results. The first is a fairly 
simple transference principle that we shall need later, and the second is a slightly more 
complicated version of it that is needed for proving the Green- Tao theorem. The latter is 
essentially the same as the "abstract structure theorem" of Tao and Ziegler [TZj . so called 
because it is an abstraction of arguments from the paper of Green and Tao. It is this 
second result that can be regarded as a major step in the proof of the Green- Tao theorem, 
and which is used to prove their transference principle. We shall end Section 4 with a brief 
description of the rest of the proof of Green and Tao. 

In section 5, we shall prove the structure theorem of Tao mentioned earlier, and show 
how it leads to a strengthened decomposition theorem. We end the section, and the paper, 
with an indication of how to use the structure theorem. 

2. Some basic concepts of additive combinatorics. 
2.1. Preliminaries: Fourier transforms and Lp-norms. 

Let G be a finite Abelian group. A character on G is a non-zero function ip : G ^ C 
with the property that ip{xy) = ip{x)ip{y) for every x and y. It is easy to show that ip 
must take values in the unit circle. It is also easy to show that two distinct characters 
are orthogonal. To see this, note first that if ipi and ip2 are distinct, then ipi{ip2)~^ is a 
non-trivial character (that is, a character that is not identically 1). Next, note that if is a 
non-trivial character and ip{y) ^ 1, then Kr^ip^x) = K^ip^xy) = ip{y)K;j.ip{x), so K;j.ip{x) = 0. 
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(The notation "K^" is shorthand for "\G\~^^^^q" .) This imphes the orthogonahty. Less 
obvious, but a straightforward consequence of the classification of finite Abehan groups, 
is the fact that the characters span all functions from G to C: that is, they form an 
orthonormal basis of L2{G). (We shall discuss this space more in a moment.) 

If / : G ^ C, then the Fourier transform f of f tells us how to expand / in terms of 
the basis of characters. More precisely, one first defines the dual group G to be the group 
of all characters on G under pointwise multiplication. Then / is a function from G to C, 
defined by the formula 



The Fourier inversion formula (which it is an easy exercise to verify) then tells us that 



which gives the expansion of / as a linear combination of characters. 

There are two natural measures that one can put on G: the uniform probability measure, 
and the counting measure (which assigns measure 1 to each singleton). Both of these are 
useful. The former is useful when one is looking at functions that are "flat" : an example 
would be the characteristic function of a dense subset A G G. If we write A{x) for xa{x), 
then KxA{x) = |G'|~^^^A(x) = lAI/IGI is the density of A. The counting measure is 
more useful for functions F that are of "essentially bounded support", in the sense that 
there is a set K of bounded size such that F is approximately equal (in some appropriate 
sense) to its restriction to K. 

If / is a fiat function, then there is a useful sense in which its Fourier transform is 
of essentially bounded support in the dual group. Therefore, if we are interested in fiat 
functions defined on G, then we look at the uniform probability measure on G and the 
counting measure on the dual group G. We then define inner products, Lp-norms, and 
£p-norms as follows. 

The inner product of two functions / and g from G to C is the quantity {f,g) = 



^xf{x)g{x). The resulting Euclidean norm is ||/||2 = ^E2;|/(x)pj , and the Euclidean 
space is L2. More generally, Lp is the space of all functions from G to C, with the norm 

II/IIp = (Ea;|/(x)p] , where this is interpreted as max|/(x)| when p = 00. 



fiij) = E,/(x)V(x) = EJ{x)ij{-x). 



f{x) = J2f{^mx), 
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On the dual group G we have the same definitions, but with expectations replaced by 
sums. Thus, (^1,^2) = Fi{x)F2{x) and = (Ex l-^(^)l^) • The resulting space 
is denoted ip. Once again, ||-F||oo is max|F(x)|, so and £00 are in fact the same space. 

Two fundamental identities that are used repeatedly in additive combinatorics are the 
convolution identity and Parseval's identity. The convolution f * g oi two functions /, : 
G — > C is defined by the formula 

f*g{x) = Ey+,=J{y)g{z), 

and the convolution identity states that (/ * g)^{ip) = /(^)^(V^) for every tp ^ G. That 
is, the Fourier transform "converts convolution into pointwise multiplication" . It also does 
the reverse: the Fourier transform of the pointwise product fg is the convolution f * g, 
where the latter is defined by the formula 

f*m= $^/(p)^(a). 

pa=ip 

Parseval's identity is the simple statement that {f,g) = {f,g)- It is important to keep 
in mind that the two inner products are defined differently, one with expectations and the 
other with sums, just as the convolutions were defined differently in G and G. Setting 
f = g in Parseval's identity, we deduce that ||/||2 = ||/||2- 

The group that will interest us most is the cyclic group I^n = Z/A^Z. If we set u = 
exp{2'iTi/N), then any function of the form x ^ u"^'^ is a character, and the functions 
and are distinct if and only if r and s are not congruent mod N. Therefore, one can 
identify with its dual, writing 

/(r) = EJ{x)u-^' 

whenever r is an element of Z^y. However, the measure we use on Z^v is different when we 
are thinking of it dual group. 

The reason that Fourier transforms are important in additive combinatorics is that 
many quantities that arise naturally can be expressed in terms of convolutions, which can 
then be simplified by the Fourier transform. For instance, as we shall see in the next 
subsection, the quantity Kx^df{x)f{x + d)f{x + 2d) arises naturally when one looks at 
arithmetic progressions of length 3. This can be rewritten as 'Kx,zf{x)f{z)f{{x + z)/2) = 
^x,zf{x)f{z)g{x + z), where g{u) = f{u/2). (We need to be odd for this to make sense.) 
This is the inner product of / * / with g, so it is equal to (/^, g). 
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2.2. What is additive combinatorics about? 

The central objects of study in additive combinatorics are finite subsets of Abelian groups. 
For example, one of the main results in the area, Szemeredi's theorem, can be formulated 
as follows. 

Theorem 2.1. For every 6 > and every positive integer k there exists N such that every 
subset A G I^N of cardinality at least 6N contains an arithmetic progression of length k. 

Here, Z^v stands for the cyclic group Z/NZ of integers mod A^, and an arithmetic progres- 
sion of length k means a set of the form {x, x + d, . . . ,x + {k — l)d} with d ^ 0. 

What are the maps of interest between finite subsets of Abelian groups? An initial guess 
might be that they were restrictions of group homomorphisms, but that turns out to be 
far too narrow a definition. Instead, they are functions called Freiman homomorphisms. 
A Freiman homomorphism of order k between sets A and 5 is a function (p : A B such 
that 

0(ai) + 0(a2) H h 0(afc) = (p{ak+i) + 0(0^+2) H h (p{a2k) 

whenever 

fli + a2 H \- ak = Cfc+i + ak+2 H h a2fc- 

In particular, a Freiman homomorphism of order 2, often just known as a Freiman homo- 
morphism, is a function such that (f){ai) + (j){a2) = 0(03) + 0(04) whenever ai + a2 = 03 + 04. 
This is equivalent to the same definition with minus instead of plus, which is often more 
convenient. 

A Freiman isomorphism of order /c is a Freiman homomorphism of order k with an 
inverse that is also a Freiman homomorphism of order k. The rough idea is that a Freiman 
homomorphism of order k preserves all the linear structure of a set A that can be detected 
by integer combinations with coefficients adding up to and with absolute values adding 
up to at most 2k. For example, it is an easy exercise to show that if A is an arithmetic 
progression and B is Freiman- isomorphic to A, then B is also an arithmetic progression. 
This is because a sequence {xi,X2, . . . ,Xm) is an arithmetic progression, written out in a 
sensible order, if and only if Xj+2 — Xj+i = Xj+i — Xi for every i. It is also easy to show that 
if A and B are isomorphic, then their sumsets A + A and B + B have the same size. 

Thus, a more precise description of the main objects studied by additive combinatorics 
would be that they are finite subsets of Abelian groups, up to Freiman isomorphisms of 
various orders. 
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An important aspect of results such as Szemeredi's theorem is that they have a cer- 
tain "robustness". For instance, combining Szemeredi's theorem with a simple averaging 
argument, one can deduce the following corollary (which was first noted by Varnavides [V]). 

Corollary 2.2. For every 6 > and every positive integer k there exists e > such that, 
for every sufficiently large positive integer N, every subset A C of cardinality at least 
6N contains at least eN"^ arithmetic progressions of length k. 

A second important aspect is that they have "functional versions". One can regard 
a subset of Zat as a function that takes values and 1. It turns out that many of the 
arguments used to prove Szemeredi's theorem apply to a much wider class of functions. 
In particular, they apply to functions that take values in the interval [0, 1]. The following 
generalization of Szemeredi's theorem is easily seen to follow from Corollary 12. 2[ 

Corollary 2.3. For every 6 > and every positive integer k there exists e > such that, 
for every positive integer N and every function f : Zn [0, 1] for which K^f^x) > 6, we 
have the inequality 

E.,rf/(x)/(x + d)...f{x + {k- l)d) > e. 

Here, K^^^ denotes the expectation over all pairs {x,d) G Z^. It can be regarded as 
shorthand for A^~^ ^, but it is better to think in probabilistic terms: the left-hand side 
of the above inequality is then an expectation over all arithmetic progressions of length k 
(including degenerate ones with d = 0, but for large these make a tiny contribution to 
the total). 

A third important aspect is a deeper form of robustness. It turns out that quantities such 
as K.j. ctf{x)f{x + d) . . . f{x+ {k — l)d) are left almost unchanged if you perturb / by adding 
a function g that is small in an appropriate norm. Furthermore, it is possible for g to be 
small in this norm even when the average size E^.|(7(x)| of g{x) is large: a typical example 
of such a function is one that takes the values ±1 independently at random. The changes 
to the values of / are then quite large, but the randomness of g forces their contribution to 
expressions such as Kx^df{x)f{x+d) . . . f{x+{k — l)d) to cancel out almost completely. This 
cancellation, rather than smallness of a more obvious kind, is what justifies our thinking 
of / + as a "perturbation" of /. 

Thus, it is tempting to revise further our rough definition of additive combinatorics 
and say that the central objects of study are subsets of Abelian groups, up to Freiman 
isomorphism and "quasirandom perturbation". However, it takes some effort to make 
this idea precise, since the notion of a Freiman homomorphism does not apply as well to 
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functions as it does to sets (because it is insufficiently robust). Also, not every quantity 
of importance in the area is approximately invariant up to quasirandom perturbations: an 
example of one that isn't is the size of the sumset A + A of a set A of size n. So we shall 
content ourselves with the observation that all the results of this paper are approximately 
invariant. 

So that we can say what this means, let us give some examples of norms that measure 
quasir andomness . 

2.3. Uniformity norms for subsets of finite Abelian groups. 

Let G be a finite Abelian group, and let (7 : G ^ C. The U'^-norm of g is defined by the 
formula 

||5f||^2 = 'K^^a,bg{x)g{x + a)g{x + h)g{x + a + 6). 

We shall not give here the verification that this is a norm (though it will follow from a 
remark we make in l2.5p . since our main concern is the sense in which it measures quasiran- 
domness. It can be shown that if / : G — > C is a function with ||/||oo ^ 1 and g is another 
such function with the additional property that ||5'||t/2 is small, then 

EM/(a:)/(x + d)f{x + 2d) ^ E,,rf(/ + g){x)U + g){x + d){f + g){x + 2d). 

A case of particular interest is when / is the characteristic function of a subset A G G of 
density 6, which again we shall denote by A, and g{x) = A{x) — 6 for every x. If H^'Hiya is 
small, then we can think of / as a quasirandom perturbation of the constant function 6. 
Then E^. ,^A(x)y4(x + d)A{x + 2d) will be around S^, the approximate value it would take 
(with high probability) if the elements of A were chosen independently at random with 
probability 6. When \\g\\u'2 is small, we say that A is a quasirandom subset of G. (This 
definition is essentially due to Chung and Graham [CGj .) 

In many respects, a quasirandom set behaves as one would expect a random set to 
behave, but in by no means all. For example, even if A is as quasirandom as it is possible 
for a set to be, it does not follow that 

E^^dA{x)A{x + d)A{x + 2d)A{x + 3d) ^ 6^. 

An example that shows this is the subset A C Z^r that consists of all x such that G 
[—6N/2, 6N/2]. It can be shown that the density of A is very close to 6 when is large. 
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and that ||(?||i72 = — 5||[/2 is extremely small. However, for this set A, 

E^^dMx)A{x + d)A{x + 2d)A{x + 3d) 

turns out to be at least c6^ for some absolute constant c > 0. We will not prove this here 
(a proof can be found in jG3] ). but we give the example in order to draw attention to its 
quadratic nature. It turns out that this feature of the example is necessary, though quite 
what that means is not obvious, and the proof is even less so. See subsection 12 . 61 for further 
discussion of this. 

This example shows that the smallness of the f/^-norm is not sufficient to explain all the 
typical behaviour of a random function. For this one needs to introduce "higher" uniformity 
norms, of which the next one is (unsurprisingly) the f/'^-norm. If is a function, then llf^H^s 
is given by the expression 

^x,a,b,c9{x)g{x + a)g{x + h)g{x + c)g{x + a + h)g{x + a + c)g{x + h + c)g{x + a + h + c). 

From this it is easy to guess the definition of the U'^ norm, but for completeness here is a 
formula for it: 

eG{0,l}fc 

where C denotes the operation of complex conjugation and |e| denotes the number of 
non-zero coordinates of e. 

These norms were introduced in [Glj . where it was shown, as part of a proof of Sze- 
meredi's theorem, that if A is a subset of 'Ln of density 5, g{x) = A{x) — 6 for every x, 
and is sufficiently small (meaning smaller than a positive constant that depends on 

6 but not on A^), then 

E^^dA{x)A{x + d)...A{x + kd) ^ 5^+\ 

Let us call a set uniform of degree k — 1 ii its U^-noim. is small. Then the above assertion 
is that a set of density 5 that is sufficiently uniform of degree k — 1 contains roughly as 
many arithmetic progressions (mod A^) of length A; + 1 as a random set of density 5 will 
(with high probability) contain. In particular, if A is quadratically uniform (meaning that 
the t/^-norm of A — 5 is sufficiently small), then 

E^,dA(x)A(x + d)A{x + 2d)A{x + 3d) ^ 5^. 

The arithmetic progression {x, x + c?, . . . , x + (/c — l)(i} can be thought of as a collection 
of k linear forms in x and d. It can be shown that for any collection of linear forms in any 
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number of variables, there exists a k such that every set A that is sufficiently uniform of 
degree k contains about as many of the corresponding linear configurations as a random 
set of the same density. This was shown by Green and Tao [GT3] . who generalized the 
argument in [Glj . The question of precisely which U'' norm is needed is a surprisingly 
subtle one. It is conjectured in [GWl] that the answer is the smallest k for which the 
kth powers of the linear forms in question are linearly independent. For instance, the 
configuration {x, x + d,x + 2d, x + 3d} needs the [/"^-norm because x — 2{x + d) + {x + 2d) = 
x'^ — 3{x + d)'^ + 3{x + 2d)'^ — {x + 3dY = 0, but the cubes are linearly independent. A special 
case of this result is proved in [GWl] using "quadratic Fourier analysis", which we will 
discuss in 12.71 to prove the full conjecture would require a theory of higher-degree Fourier 
analysis that will probably exist in due course but which has not yet been sufficiently 
developed. 

2.4. Uniformity norms for graphs and hypergraphs. 

There are very close and important parallels between uniformity of subsets of finite Abelian 
groups, and quasirandomness of graphs and hypergraphs. For this reason, even though the 
relevant parts of graph and hypergraph theory belong to extremal combinatorics, they have 
become part of additive combinatorics as well: one could call them additive combinatorics 
without the addition. 

Since that may seem a peculiar thing to say, let us briefly see what these parallels are. 
Let G be a graph on n vertices. One can think of G as a two- variable function G{x,y), 
where x and y are vertices and G{x,y) = 1 if xy is an edge and otherwise. Just as we 
may regard a subset A of a finite Abelian group as quasirandom if a certain norm of A — 6 
is small, we can regard a graph as quasirandom if a certain norm of the function G — 6 
(where now 6 is the density Kx,yG{x, y) of the graph G) is small. This norm is given by 
the formula 

\\9\\gu^ = ^x,x',y,y'gix,y)g{x,y')g{x',y)g{x',y'), 

which makes sense, and is useful, whenever X and Y are finite sets and g : X xY ^ C The 
theory of quasirandom graphs was initiated by Thomason |Th] and more fully developed 
by Chung, Graham and Wilson [CGWj . The definition we have just given is equivalent to 
the definition in the latter paper. 
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To see how this relates to the f/^ norm, let X and Y equal a finite Abelian group F, let 
/ : r — > C and let g{x, y) = f{x + y). Then 

hVcm = + y)f{x + y')fix' + y)f{x' + y'). 

The quadruples {x + y,x + y', x' + y,x' + y') are uniformly distributed over all quadruples 
(a, 6, c, (i) such that a + d = 6 + c. Since the same is true of all quadruples of the form 
(a;, X + a, X + 6, X + a + 6), we see that the right-hand side of the above formula is nothing 
other than H/Hc/^- 

A similar argument can be used to relate the higher-degree uniformity norms to notions 
of quasirandomness for /c-uniform hypergraphs, which are like graphs except that instead 
of having edges, which are pairs of vertices, one has hyperedges, which are /c-tuples of 
vertices. The following formula defines a norm on /c-variable functions: 

ee{0,l}'' 

If /(xi, . . . ,Xfc) has the form g{xi H h x^), then ||/||/i-[/fc = ||fi'||c/fe- 

A hypergraph H of density 6 behaves in many respects like a random hypergraph of 
density 6 if \\H — 6\\Hi/k is small enough. For instance, if /c = 3, then the simplex density, 
which is given by the expression 

E^,y,z,wH{x, y, z)H{x, y, w)H{x, z, w)H{y, z, w) 

is roughly S^, or what it would be in the random case. More generally, if \\Hi — S\\hu>' is 
small for z = 1, 2, 3, 4, then 

E^,y,z,wHi{x, y, z)H2{x, y, w)H'i{x, z, w)Hi{y, 2, w) 

is again roughly 5^. This assertion, which is proved by repeated use of the Cauchy-Schwarz 
inequality (see |G2] for a more general result), implies that 

Ex,j/,^,t„v4(-3x -2y- z)A{-2x -y + w)A{-x + z + 2w)A{y + 2z + "iw) ^ 8^ 

whenever A is a subset of Z^r such that \\A — 5\\u3 is small. But the four linear forms 
above form an arithmetic progression of length 4 and common difference x + y + z + w. 
this is a sketch of what turns out to be the most natural proof that the f/'^-norm controls 
arithmetic progressions of length 4. 

These ideas can be developed to give a complete proof of Szemeredi's theorem: see 

[MRS], [RS], m- 
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2.5. Easy structure theorems for the [/^-norm. 

A great deal of information about the ?7^-norm comes from the following simple observation. 
Lemma 2.4. Let G be a finite Ahelian group and let f : G ^ C Then \\f\\u^ = ||/||4- 
Proof. By the convolution identity and Parseval's identity, 

WfWfj, = E.+,=,+^/(a;)/(y)7M7H = if * fJ * f) = (f , P) = E l/WI'" 

The result follows on taking fourth roots. □ 

Now let us suppose that II/II2 < l, and let us fix some small constant 77 > 0. Then 
the number of characters ip such that > 77 is at most 77"^, since Xl^/jl/l"^)!^ — 

II/II2 — II/II2 — 1- Using the inversion formula and this fact, we can split / into two parts, 
"l^jljeK fW"^ and X^v^x •^(^)^' "^^ere K is the set of all ip such that |/(^/')| > rj. Let 
us call these two parts g and h, respectively. The function g involves a bounded number 
of characters, and characters are functions that we can describe completely explicitly. 
Therefore, it can be thought of as the "structured" part of /. As for h, it is quasirandom, 
since 

l|/^ll^ = l|/^ll4<l|/^ll2l|/^llL<^'ll/ll2<^'- 

Unfortunately, this simple decomposition turns out not to be very useful, for reasons 
that we shall explain later. We mention it in order to put some of our later results in 
perspective. The same applies to the next result, which shows that one can obtain a 
much stronger relationship between ||/i||(72 and the upper bound on the size of K if one 
is prepared to tolerate a small L2-error as well. This result and its proof are part of the 
standard folklore of additive combinatorics. 

Proposition 2.5. Let f be a function from a finite Abelian group G to C and suppose that 
II/II2 < 1- Let rj : IR+ M.^ be a positive decreasing function that tends to and let e > 0. 
Then there is a positive integer m such that f can be written as /i + /2 + /s, where fi is 
a linear combination of at most m characters, \\f2\\u'^ < v{''^)> '^^'^ ll/slh ^ ^■ 

Proof. Let N = \G\ and let us enumerate the dual group G a.s ipi, ... ,ipN such a way that 
the absolute values of the Fourier coefficients f{ipi) are in non-increasing order. Choose an 
increasing sequence of positive integers mi,m2, ... in such a way that m^+i > 77(771^)"^ for 
every r. 

Now let us choose i and attempt to prove the result using the decomposition fi = 

T.i<mr fii^i)^^^ h = T.i>mr+^ fi'Pi)^i^ ^ud /g = J2mr<i<m,+i /(^OV^i- Thcu /i is a linear 
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combination of at most rrir characters. Since there can be at most rrir+i characters i/j with 
\f{ip)\ > m^^{^, we find that 

11/211'^.^ = 11/211^ < m;l,\\Ml < Vimrnnl < Vimr)'. 

Therefore, we are done if II/3II2 < e- But the possible functions fs (as r varies) are disjoint 
parts of the Fourier expansion of /, so at most of them can have norm greater than e. 
Therefore, we can find r < such that the proposed decomposition works. □ 

There is nothing to stop us taking mi = 1. The proof then gives us the desired decom- 
position for some m that is bounded above by a number that resuhs from starting with 1 
and applying the function t \—>- ri{t)~^ at most times. Although Proposition 12.51 is still 
not all that useful, it resembles other results that are, as we shall see in due course. In 
those results, it is common to require 77 (m) to be exponentially small: the resulting bound 
is then of tower type. 

2.6. Inverse theorems. 

A direct theorem in additive number theory is one that starts with a description of a 
set and uses that description to prove that the set has certain additive properties. For 
instance, the statement that every positive integer is the sum of four squares starts with 
the explicitly presented set S of all perfect squares, and proves that the four-fold sumset 
S' + 5'-|-S' + S'is the whole of N U {0}. An inverse theorem is a result that goes in the 
other direction: one starts with a set A that is assumed to have certain properties, and 
attempts to find some kind of description of A that explains those properties. Ideally, this 
description should be so precise that it actually characterizes the properties in question: a 
set A has the properties if and only if it satisfies the description. 

A remarkable inverse theorem, which lies at the heart of many recent results in additive 
combinatorics, is a theorem of Freiman [F] (later given a considerably more transparent 
proof by Ruzsa [Ru] ) that characterizes sets that have small sumsets. If A is a set of 
n integers, then it is easy to show that the sumset A + A has size at least 2n — 1 and 
at most n{n + l)/2. What can be said about A if the size of the sumset is close to 
its minimum, in the sense that |yl + yl| < C\A\ for some fixed constant CI A simple 
example of such a set is an arithmetic progression. A slightly less simple example is a set 
A that is contained in an arithmetic progression of length at most Cn/2. A less simple 
example altogether is a "two-dimensional arithmetic progression" : that is, a set of the form 
{xq + rdi + sd2 :0<r<ti,0<s< ^2}- If A is such a set, then |/1 + y4| < 4|yl|, and 
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more generally if A is a fc-dimensional arithmetic progression (the definition of which is 
easy to guess), then + < 2*^|/1|. As in the one-dimensional case, one can pass to large 
subsets and obtain more examples. Freiman's theorem states that one has then exhausted 
all examples. 

Theorem 2.6. For every C there exist k and K such that every set A of n integers such 
that the sumset A + A has size at most Cn is contained in an arithmetic progression of 
dimension at most k and cardinality at most Kn. 

Freiman's theorem has been extremely infiuential, in large part because of Ruzsa's proof, 
which was extremely elegant and conceptual, and gave much better bounds than Freiman's 
argument. These bounds have subsequently been improved by Chang [C], who added 
further interesting ingredients to Ruzsa's argument. A generalization of Freiman's theorem 
to subsets of an arbitrary Abelian group was proved by Green and Ruzsa [GRj . 

The notion of an inverse theorem makes sense also for functions defined on Abelian 
groups. For instance, here is a simple inverse theorem about functions with large [/^-norm. 

Proposition 2.7. Let c > 0, let G he a finite Abelian group, let f : G ^ C be a function 
such that II/II2 < 1 and suppose that ||/||t/2 > c. Then there exists a character such that 
\{f,i^)\>c'. 

Proof. By Lemma [2.41 and our assumptions about /, 

11/11^ <ll/l|oo||/||2<||/||oo, 

which is what is claimed. □ 

Conversely, and without any assumption about II/II2, if there exists a character ip such 
that {f,4') > c, then ||/||(72 = ||/||4 > c^^^. Therefore, correlation with a character 
"explains" the largeness of the [/^-norm. 

What about the f/^-norm? This turns out to be a much deeper question. As our remarks 
earlier have suggested, quadratic functions come into play when one starts to think about 
it. For example, if / : Zjv ^ C is the function x ^ uP'"'"-^^ for some r (where a; is once again 
equal to exp(27rz/A^)), then the identity 

x^-(x + a)^-(x + 6)^-(x + c)^ + (x + a + 6)^ + (x + a + c)^ + (x + 6 + c)^-(x + a + 6 + c)^ = 

implies easily that H/Hc/s = 1. However, it is also easy to show that / does not correlate 
significantly with any character. Therefore, we are forced to consider quadratic functions. 
If g is a quadratic function, then let us call the function cu^ a quadratic phase function. 
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It is tempting to conjecture that a bounded function / with f/^-norm at least c must 
correlate with a quadratic phase function, meaning that E^/(x)u;^(^) > c' for some qua- 
dratic function q and some constant c' that depends on c only. However, although such a 
correlation is a sufficient condition for the f/^-norm of / ot be large, it is not necessary, 
because there are "multidimensional" examples. For instance, if P is the two-dimensional 
arithmetic progression {xq -|- rdi + sd2 '■ < r < ti,0 < s < , then we can define 
something like a quadratic form g on P by the formula q{xQ + rdi + sd2) = ar'^ + hrs + cs^. 
We can then define a function / to be oj^'^^^ when x E P and otherwise. Let us call such a 
function a generalized quadratic phase function. It is not hard to prove that such functions 
have large norms, and that they do not have to correlate with ordinary quadratic phase 
functions. 

In jGl] . the following "weak inverse theorem" was proved for all norms. 

Theorem 2.8. Let c > be a constant and let f : Z^r ^ C be a function such that 
ll/lloo ^ 1 ond ||/||[/fe > c. Then there is a partition ofJ^^ into arithmetic progressions Pi 
of length at least N"^^'^\ and for each Pi there is a polynomial of degree at most k such 
that, writing iCi for the density |Pj|/A^ of Pi, we have '^■'Ki\'Ex£p^f{x)uj^^^^^\ > c/2. 

This result was the main step in the proof of Szemeredi's theorem given in [Glj . The 
reason that this is a "weak inverse theorem" is that the converse is far from true. The result 
shows that / correlates with a function that is made out of many fragments of polynomial 
phase functions, but it does not provide what one might hope for: correlation with a single 
generalized polynomial phase function. However, the proof strongly suggested that such a 
result should be true, and Green and Tao, by adding some important further ingredients, 
have established a strong inverse theorem in the quadratic case |GT2] . Let us state their 
result a little imprecisely. 

Theorem 2.9. Let c > be a constant and let f : Z^r ^ C be a function such that 
ll/lloo < 1 (ind ll/llc/fe > c. Then there exists a constant c' that depends on c only, and a 
generalized quadratic phase function g, such that \ {f,g) \ > c' . 

2.7. Higher Fourier analysis. 

As we have seen, the f/^-norm of a function / is equal to the ^4-norm of its Fourier trans- 
form, and this observation leads quickly to a decomposition of functions into a structured 
part and a quasirandom part. Is there a comparable result for the f/^-norm? The inverse 
theorem of Green and Tao suggests that we should try to decompose / into generalized 



DECOMPOSITIONS, APPROXIMATE STRUCTURE, TRANSFERENCE, AND THE HAHN-BANACH THEOREM 

quadratic phase functions. However, there are far more than of these, so they do not 
form an orthonormal basis, or indeed a basis of any kind. One might nevertheless hope 
for some canonical way of decomposing a function, but it is far from clear that there is 
one — certainly, nobody has come close to finding one. 

However, one can still hope for a structure theorem that resembles Proposition 12.51 We 
would expect it to say that a function / can be decomposed into a linear combination of a 
small number of generalized quadratic phase functions, plus a function with very small U^- 
norm, plus a function that is small in L2. Green and Tao deduced such a result from their 
inverse theorem, and thereby initiated a form of quadratic Fourier analysis. In |GW1] . a 
different method was given for deducing somewhat different decomposition theorems from 
inverse theorems. The main ingredient of this method was the Hahn-Banach theorem: 
the proof will be sketched in the next section. This gave an alternative form of quadratic 
Fourier analysis, which provided much better bounds for the results of that paper (the 
ones that concerned controlling systems of linear forms with f/'^-norms). 

We shall have more to say about higher Fourier analysis later in the paper. 

2.8. Easy structure theorems for graphs. 

We have already seen that the ?7^-norm of the one- variable function g, defined on a finite 
Abelian group G, can be regarded as the Gf/^-norm of the two- variable function /(x, y) = 
g{x + y). The relationship does not stop here, however. U ip is a character, then, for any 

X, 

^y9{x + y)ipi-y) = 'ilj{x)Eyg{x + y)ipi-x - y) = g{'ilj)ij{x) 

This shows that characters are similar to eigenvectors of the symmetric matrix f{x,y), 
except that they are mapped to multiples of their complex conjugates. However, it is 
notable that the corresponding "eigenvalues" are the Fourier coefficients of the function g. 
This observation suggests, correctly as it turns out, that eigenvalues play a similar role for 
real symmetric matrices to the role played by Fourier coefficients for functions defined on 
finite Abelian groups. 

We briefly illustrate this by proving a result that is analogous to Proposition 12.51 First, 
we prove a well-known lemma relating the Gf/^-norm to eigenvalues. It will tie in bet- 
ter with our previous notation (and with applications of matrices to graph theory) if we 
use a slightly unconventional association between matrices and linear maps, as we did 
above. Given a matrix f{x, y) and a function u{y) we shall think of fu{x) as the quantity 
Kyf{x, y)u{y) rather than the same thing with a sum. 
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The finite-dimensional spectral theorem tells us that a real symmetric matrix f{x,y) 
has an orthonormal basis of eigenvectors. If these are mi, . . . ,m„ and the corresponding 
eigenvalues are Ai, . . . , A„, then we can express this by saying that 

f{x,y) = ^ AjUj (g) Ui, 

i 

where u(^v denotes the function u{x)v{y). To see why these are the same, consider the effect 
of each side in turn on a basis vector Uj. On the one hand, we have Kyf{x, y)uj{y) = XiUj{x) 
(by our unconventional definition of matrix multiplication) while on the other we have 

Ej^ ^ XiUi (g) Ui{x, y)uj{y) = ^ \iUi{x)EyUi{y)uj{y) = ^ \iUi{x)6ij = \jUj{x) 

i i i 

by the orthonormality (with respect to the L2-norm) of the eigenvectors. 

Lemma 2.10. Let X he a finite set and let f he a symmetric real-valued function defined 
on X"^ . Let the eigenvalues of f he Xi, . . . , A„,. Then \\f\\Q^2 = J2r K- 

Proof. All results of this kind are proved by expanding the expression for ||/||^[;2 in terms 
of the spectral decomposition J2r '^rUr <S) Ur of f. 
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E^,x''Ey,y'fix, y)f{x, y')fix', y)f{x', y') 

^ \p\q\rXsUp{x)up{y)uq{x)uq{y')ur{x')ur{y)usix')us{y') 



p,q,r,s 

^ A A A 

qs 



^ ^ Xp^q^r^s^pq^pr^rs^i 
p,q,r,s 



as claimed. □ 
A similar but easier proof establishes that ||/||2 = J2r -^r- 

The next result is a direct analogue for symmetric two-variable real functions (and 
therefore in particular for graphs) of Proposition 12.51 

Proposition 2.11. Let X he a finite set and let f he a symmetric real-valued function on 
X"^ such that II/II2 < 1- Let rj : ]R_|_ R_|_ he a positive decreasing function that tends to 
and let e > 0. Then there is a positive integer m such that f can he written as /i + /2 + /s, 
where fi is a linear comhination of at most m orthonormal functions of the form u® u, 
II/2IIGI/2 < Vijn), and II/3II2 < e. 
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Proof. Let = |X| and let us enumerate an orthonormal basis {ui)i of eigenvectors 
of / in such a way that the absolute values of the eigenvalues Aj are in non- increasing 
order. Choose an increasing sequence of positive integers mi,m2, ... in such a way that 
T^r+i > r]{mr)~'^ for every r. 

Now let us choose i and attempt to prove the result using the decomposition /i = 

Ei<m. ® /2 = Ei>™.+i ® u^, and /s = J2mr<t<m,+i ^i^i ® ^hcu /i is a 
linear combination of at most rrir eigenvectors, which are orthonormal to each other by the 
spectral theorem. By the remark above about the sum of the squares of the eigenvalues, 
there can be at most m^+i eigenvectors Ui with |Aj| > mj.^{^ . Therefore, 

ll/2||^= E ^t<m;l,Y.^" = '^rli\\f\\l<v{mrr. 

i>m.r+i i 

Therefore, we are done if II/3II2 < But the possible functions /s (as r varies) are disjoint 
parts of the spectral expansion of /, so at most of them can have norm greater than e. 
Therefore, we can find r < such that the proposed decomposition works. □ 

The above result is closely related to a "weak regularity lemma" due to Frieze and 
Kannan [FKj . 

3. The Hahn-Banach theorem and simple applications. 
Let us begin by stating the version of the Hahn-Banach theorem that we shall need. 

Theorem 3.1. Let K be a convex body in M" and let f be an element 0/ that is not 
contained in K . Then there is a constant (3 and a non-zero linear functional (p such that 
{f, 0) > /5 (ind {g, (f)) < P for every g E K . 

Now let us prove two corollaries, both of which are useful for proving decomposition 
theorems. 

Corollary 3.2. Let Ki,...,Kr be closed convex subsets 0/ M", each containing 0, let 
Ci,...,Cr be positive real numbers and suppose that f is an element ofW^ that cannot be 
written as a sum /i + ■ ■ ■ + /r with fi G CiKi. Then there is a linear functional (p such that 
(/, (p) > 1 and {g, cp) < cj^ for every i < r and every g E Ki. 

Proof. Let K be the convex body '^■CiKi. Our hypothesis is that f ^ K. Since K is 
closed, it follows that there exists e > such that (1 + e)~^f ^ K. Therefore, by Theorem 
13. H there is a constant (3 and a linear functional (p such that (1 + e)~^(/, 0) > (3 and 
(fi'j 0) < (3 for every g E K. Again using the fact that K is closed, we can add a small 
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Euclidean ball B to K in such a way that (1 + e)^^f ^ B + K. Since G -ft', it follows 
that /5 > 0. Therefore, we can divide (p hy [3 and get [3 to be 1, with the result that 
(/i^) > (1 + Since belongs to each Kj, we can also conclude that ((7,0) < 1 for 
every g G CiKi, which completes the proof. □ 

Corollary 3.3. Let Ki, . . . , Kr be closed convex subsets o/ M", each containing and 
suppose that f is an element o/M" that cannot be written as a convex combination ci/i + 
■ ■ ■ + Crfr with fi G Ki . Then there is a linear functional cf) such that (/, 0) > 1 and 
{g,<P) ^ 1 for every i < r and every g E K^. 

Proof. Let K be the set of all convex combinations Ci/i + ■ ■ ■ + c^/r with fi G K^. Then 
is a closed convex set and / is not contained in K. Therefore, there exists e > such 
that (1 + e)~V i K- By Theorem O there is a functional and a constant f3 such that 
(1 + e)~^(/, 0) > P and ((7,0) < P whenever g belongs to K. In particular, ((7,0) < /? 
whenever g belongs to one of the sets Ki. As in the proof of the previous corollary, (3 must 
be positive and can therefore be assumed to be 1. The result follows. □ 

Recall that if ||.|| is a norm on M", then the dual norm ||.||* is defined by the formula 
II0II* = max{(/, 0) : ||/|| < 1}. If / G then Theorem 13.11 implies that there exists a 
functional such that ||0||* < 1 and (/, 0) = ||/||. Such a functional is called a support 
functional for /. In this paper it will be convenient to call a support functional if 7^ 
and (/, 0) = 11/11 II0II*, so that a positive scalar multiple of a support functional is also a 
support functional. 

The following lemma is useful in proofs that involve the Hahn-Banach theorem, as we 
shall see in section 13. 2[ It tells us that the dual of an £i-like combination of norms is an 
ioQ-like combination of their duals. We shall adopt the convention that if ||.|| is a norm 
defined on a subspace V of M" then its dual ||.||* is the seminorm defined by the formula 
||/r = max{(/,(7):(7Gy,||(7||<l}. 

Lemma 3.4. Let be a set and for each a eT, let be a norm defined on a subspace 
Vfj o/M". Suppose that Xlo-es ~ define a norm ||.|| on by the formula 

\\x\\ = inf{||xi||^, H h \\xk\\ak '■ \- Xk = x,ai, . . . ,ak eT.} 

Then this formula does indeed define a norm, and its dual norm \\.\\* is given by the formula 

\\z\\* = max{||^||; : (T G S} 



Proof. It is a simple exercise to check that the expression does indeed define a norm. 
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Let US begin by supposing that ||2;||* > 1 for some a G S. Then there exists x & 
such that ||a;||o- < 1 and \ {x, z) \ > 1. But then < 1 as well, from which it follows that 
> 1. Therefore, ||2||* is at least the maximum of the 

Now let us suppose that ||2;||* > 1. This means that there exists x such that ||x|| < 1 
and I (x, 2) I > 1 + e for some e > 0. Let us choose Xi, . . . ,Xk such that Xi G V„. for each i, 
Xi + ■ ■ ■ + Xk = X, and + ■ ■ ■ + Ha^felUfc < 1 + e. Then 

\{Xi,z)\ > IIXill^i H h \\Xk\\ak 

i 

SO there must exist i such that \ {xi,z)\ > ||a;j||o-^, from which it follows that > L This 
proves that is at most the maximum of the \\z\\*. □ 

A particular case that will interest us is when S is a subset of M", for each cr G S the 
subspace V^- is just the subspace generated by a, and the norm on is ||A(t||o- = |A|. The 
dual seminorm is then ||/||* = |(/, o")|. Thus, if we specialize Lemma [3.41 to this case then 
we obtain the following corollary. 

Corollary 3.5. Let S C M" 6e a set that spans R" and define a norm ||.|| on R" by the 
formula 

k k 

11/11 = i^HX] ■ f = ^^i^i^ (Ti, . . . ,Crfc G S}. 
i=l 1=1 

Then this formula does indeed define a norm, and its dual norm ||.||* is defined by the 
formula \\f\\* = max{\{f, a) \ : a G S}. 

3.L A simple structure theorem. 

We now prove a very simple (and known) decomposition result that illustrates our basic 
method. 

Proposition 3.6. Let \\.\\ be any norm on M" and let f be any function in R". Then f 
can be written as g + h in such a way that \\g\\ + < ||/||2- 

Proof. Suppose that the result is false. We shall apply Corollary [33] to the function //II/II2, 
with Ki and K2 taken to be the unit balls of ||.|| and ||.||*. Our hypothesis is equivalent to 
the assertion that //II/II2 is not a convex combination CiQi + 02(72 with Qi G Ki for i = 1,2. 
Therefore, we obtain a functional such that (/, 0) > II/II2 and ||0||* and ||0|| are both at 
most 1. But the first property implies, by the Cauchy-Schwarz inequahty, that ||0||2 > 1, 
while the second implies that ||0||2 = (07 0) < II0IIII0II* ^ 1- This is a contradiction. □ 
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A simple modification of Proposition 13. 61 makes it a little more flexible. Suppose that we 
wish to write f a.s g + h with \\g\\ small and \\h\\* not too large. If we define a new norm |.| 
to be e~^||.||, then |.|* = e||.||*. Applying Proposition 13.61 to these rescaled norms, we find 
that we can write f as g + h in such a way that e^^H*?!! + e||/i||* < ll/lh- In particular, if 
II/II2 = 1, then ll^fll < e and \\h\\* < 

The reason such a result might be expected to be useful in additive combinatorics is 
that, as demonstrated in the previous section, we have a good supply of norms ||.|| that 
measure quasirandomness. Moreover, their duals, as we shall see later, can be thought 
of as a sort of measure of structure. Perhaps the simplest example that illustrates this is 
if we look at functions / defined on finite Abelian groups, and take ||/|| to be ||/||oo- If 
||/||2< l,then 

l|/H^ = ll/ll4<ll/l|2||/||oo<||/||oo, 

a calculation we have already done. This shows that for functions with bounded L2-norm 
there is a rough equivalence between ||/||j/2 and ||/||oo5 in the sense that if one is small 
then so is the other. 

Thus, if II /II 2 < 1 then ||/|| = ||/||oo being small tells us that / is quasirandom. The 
dual norm, ||/||* = ||/||i, is a sort of measure of structure, since if ||/||i is at most C, 
then / is a small multiple of a convex combination of trigonometric functions, which can 
be approximated in L2 by a linear combination of a bounded number of such functions. 
Thus, we recover a result that resembles Proposition 12.51 It is weaker, however, because 
we have not yet related the quasirandomness constant to the structure constant by means 
of an arbitrary function. However, this is easily done, again with the help of an L2 error 
term, as the next result shows. 

Proposition 3.7. Let f be a function in with II/II2 < 1 and let \\.\\ be any norm on 
M". Let e > and let rj : M_|_ be any decreasing positive function. Let r = [2e~^] 

and define a sequence Ci, . . . ,Cr by setting Ci = 1 and Ci = 2?7(Ci_i)~^ when i > 1. Then 
there exists i < r such that f can be decomposed as /i + /2 + /s with 

cr^fir + ri{ar'\\f2\\ + e''m\2<i. 

In particular, \\fi\\* < d, II/2II < r]{Ci) and II/3II2 < e. 

Proof. If there is no such decomposition for i, then by Corollary 13.31 there is a functional 
0i such that ||0i|| < ||0i||* < r]{Ci)~^, ||0i||2 < and /) > 1. If this is true for 

every i < r then 

1101 + ■ ■ ■ + 0r||2 > (01 + ■ ■ ■ + 0r, /) > T, 
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where the first inequahty follows from Cauchy-Schwarz and the assumption that II/II2 < 1- 
On the other hand, ii i < j then 

{<p^A,)<mm\*<v{Qr'cr'<i/2, 

the last inequality following from the way we constructed the sequence Ci, . . . ,Cr. There- 
fore, 

||</'i + --- + 0r||2 <e-V + r(r-l)/2. 
This contradicts the previous estimate, since r > 2e~^. □ 

It is easy to deduce Proposition 12.51 from this result. Of course, this fact on its own is 
not a very convincing demonstration of the utility of the Hahn-Banach theorem, since for 
the norm ||/|| = ||/||oo it is easy to write down an exphcit decomposition of /, as we saw in 
section [231 But there are other important norms where this is certainly not the case. For 
example, if we take ||/|| to be for a larger k, then there is no obvious decomposition 

of / from which we can read off three functions /i, /2 and /a with the required properties. 

3.2. Deducing decomposition theorems from inverse theorems. 

The main result of [GWlj shows that certain linear configurations occur with the "ex- 
pected" frequency in any set A for which the balanced function A — 6 (where 6 is the 
density of A) has sufficiently small f/^-norm. The interest in the result is that the [/^-norm 
suffices for the configurations in question, whereas the natural arguments that generalize 
the proof that the U'' norm controls progressions of length k — 2 would suggest that the 
f/3_norm was needed. In order to prove the result, a form of quadratic Fourier analysis 
was needed, as we have already mentioned. The approach in [GWlj was to apply directly 
a result of Green and Tao, which obtains a decomposition of a bounded function / by 
constructing an averaging projection P with the property that / — Pf has small norm. 
However, there was a technicality involved that forced us to use an iterated version of their 
result that gives rise to very weak bounds. In order to obtain reasonable bounds for the 
problem, it turned out to be convenient — indeed, as far as we could tell, necessary — to 
prove a decomposition theorem that could be regarded as a quadratic analogue of Propo- 
sition [231 with the important difference that the strong dependence of rj^m) on m was not 
needed. (This was why it was possible to obtain good bounds.) 

The argument appears in [GW2] . and it can be regarded as a special case of a general 
principle that can be informally summarized as follows: to each inverse theorem there is 
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a corresponding decomposition theorem. It is possible to give a formal statement, as will 
be clear from our discussion, but in practice it is much easier to describe a method for 
deducing decompositions from inverse theorems than it is to state an artificial lemma that 
declares that the method works. The main reason for this is that when one applies the 
method, one typically starts with the decomposition one wants to prove and the inverse 
theorem one can prove, and adjusts the former until it follows from the latter. We shall 
reflect this in our discussion below: more precisely, we shall assume that a decomposition 
of a certain general kind does not exist, draw an easy consequence from this, and see when 
this consequence contradicts any given inverse theorem. 

Suppose, then, that we have a subset S C M" of functions that we regard as "structured" , 
and suppose that the functions in S span M". Suppose also that we have another function 
/ that we would ideally like to decompose as a linear combination Yli=i ^i'^i '^^ functions 
cTj G S with Yli=i ^'^^ ^'^'^ large, together with some error terms. That is, we look for 
a result of the following kind. 

Hoped-for decomposition. The function f can be written in the form 

k 

f = ^ + ^ 9r, 

i=l 

where Yli=i — ^> ^o.ch (Xj belongs to S, and for each j < r we have an inequality of 
the form \\gj\\^j) < Vj- 

Typically, r will be a very small integer such as 2. 
Lemma 13.41 says that the formula 

k k 

\\g\\ = inf{^ |Ai| : 5f = ^ AiCTj, CTj e S} 

i=l i=l 
k 

= inf{^ \\g\\^^ : g = g^^ hfi-fc, cti, . . . , (7^ G S, gi E KJ 

i=l 

defines a norm, and that the dual of this norm is the norm 

||0|r = max|(a,0)|. 

Now let us suppose that no decomposition of the kind we are looking for exists. This is 
equivalent to the assumption that / has no decomposition of the form go + gi + ■ ■ ■ + gk 
with \\go\\ < M and < f]i for every i. If this is the case, then by Corollary 13.21 there 
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must be a linear functional such that (/, 0) > 1, < and < r]^^ for 

1 = 1,2,. ..,r. 

The statement that < M"^ tells us that \{a,(j))\ < M'^ for every G S. Thus, 

what we would like is an inverse theorem that concludes the opposite: that there must 
be some cr G S such that 0)| > . Before we think about this, let us list the 
assumptions that we have at our disposal. 

Consequences of failure of decomposition. Suppose that there is no decomposition 
f = Yl'i=i ^(^1+91 + ' ■ ■+5'r such that Yli=i — ^^^^ '^i belongs to S, and \\gj\\{j) < rjj 
for each j < r. Then there exists such that 

(i) {a, 0) < for every a G S; 

(tt){fA)>l; 

(m) ||0||(j) < Vj^ for J = 1,2, ...,r. 

The assumptions of an inverse theorem are typically that / is not too big in one norm, 
such as, for instance, the Loo-norm, but not too small in another, such as the ?7^-norm. The 
only information we have that could possibly imply a lower bound on any norm of is the 
inequality (/, 0) > 1, and even that does not help unless we have an upper bound on some 
norm of /. (Of course, it is hardly surprising that such a bound would be required for a 
theorem that allows us to decompose / into a bounded combination of bounded functions.) 

So let us suppose that we have an inverse theorem of the following form. (We have 
introduced the constant K to allow us to multiply by an arbitrary non-zero scalar.) 

Putative Inverse Theorem. Let G M" he a function such that ||0|| < K and |||0||| > e. 
Then there exists a G S such that |(cr, 0)| > Kc{e/K). 

This will be contradicted under the following circumstances: 

(a) the upper bounds ||0||(j-) < rj^^ imply that ||0|| < K; 

(b) the upper bounds on the ||0||(j), an upper bound on some norm of /, and the lower 
bound (/, 0) > 1, together imply that |||0||| > e; 

(c) M-i < Kc{e/K). 

For example, suppose that M^^ < Kc^erj) and we would like a decomposition / = 

Si=i -^'t^j + 9 + h with Yli=i l-^jl — IllS'lll — ^ ll^ll* — V- If such a decompo- 
sition does not exist, then we obtain such that {a, 0) < rj^^c^erj) for every a G S, 
|||0|ir < e~\ II0II < and (/, 0) > 1. If we also know that ||/||2 < 1, then it follows 
that II0II2 > 1. But since ||0||2 < 1 1 10| 1 1.| 1 10| 1 1*, it follows that |||0||| > e. This contradicts 
the inverse theorem (with K = rj"^). 
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If we know a little bit more about /, then we can obtain a correspondingly stronger 
result. For instance, suppose that we know that |||/|||* < e~^. Then the bound (/, 0) > 1 
immediately implies that 1 1 10| 1 1 > e, so we do not need the error term g in the decomposi- 
tion. 

Decomposition results obtained by the simple argument above — ^just assume that a de- 
composition doesn't exist, apply Hahn-Banach, and contradict an inverse theorem — can be 
very useful. However, in order to use them one has to do a little more work. For example, 
it is not usually trivial that a sum of the form X]i=i ^i^i "structured" , even if the sum 
Si=i is smallish and all the individual functions ai are highly structured. The difficulty 
is that k may be very large, and in order to deal with it one tends to need a principle that 
says that functions ai are either "closely related" or "far apart" . A simple example is when 
S is the set of all characters, in which case any two elements of S are either identical or 
orthogonal. In |GW2] a lemma was proved to the effect that two generalized quadratic 
phases were either "linearly related" or "approximately orthogonal" . That made it possible 
to replace the linear combination by a much smaller linear combination of slightly more 
general functions. 

A second point is that one sometimes wants more information about the "structured 
function" /i = Yl\=i ^i'^i- For instance, if ||/||oo < 1 it can be extremely helpful to know 
that II /i Hoc < 1 as well. This does not come directly out of the method above, but it does 
when we combine that method with methods that we shall discuss in the next section. 

Just before we finish this section, we observe that inverse theorems can be used to prove 
strengthened decomposition theorems as well: that is, ones where some of the rji can be 
made to depend on M. Suppose, for example, that our inverse theorem tells us that 
whenever ||0||oo < 1 and ||0|| > e there must exist cr G S such that |o", 0| > c(e). Suppose 
also that (as often happens) ||/||* > ||/||oo for every / G M". Now let / be a function with 
II/II2 < 1 and use Proposition [23] to write / as /i + /s + h with ||/i||* < C, II/2II < r]{C) 
and 11/3 II 2 < d- In our discussion just after the statement of the putative inverse theorem, 
we observed that knowing that ||/i||* < C would yield a decomposition /i = X]i=i ^i^^i + h, 
where Yl!i=i l-^^l — c(6'C~^) (taking e = C~^, K = C, and replacing r] by 9), and \\h\\i < 9. 
Therefore, we can decompose / as X]i=i ■^t'^i + /2 + /s + ^5 with Yli=i l-^d — c(6'C~^), 
11/2 II ^ viC), and 11/3 + /;. Ill < 29. Since 77 is an arbitrary function, we can make it depend 
in an arbitrary way on c{9C^'^). Thus, we have obtained the following result. (Note that 
the constants and functions are not the same as the constants and functions with the same 
names in the discussion that has just finished.) 
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Theorem 3.8. Let be a subset 0/ R" that spans M". Let \\.\\ be a norm such that 
II /II 00 < 11/11* for every / G M". Suppose that for every function f with ||/||oo < 1 and 
11/11 > e there exists a G S such that \{f,a)\ > c(e). Let 6 > and let t] be a decreasing 
function from R+ to IR+. Then there exists a constant Co, depending on rj and 9 only, such 
that every function / G M" with \\f\\2 < 1 has a decomposition 

k 

f = J2 ^i'^i + /2 + /s 

i=l 

with the following property: each ai belongs to E and there is a constant C < Cq such that 
Ell\>^^\<C, \\f2\\<v{C), and \\fs\\i<V- 

4. The positivity and boundedness problems. 

Although our resuhs so far are sometimes useful, they have a serious limitation. Suppose, 
for example, that we wish to use Proposition 13.71 What we would like to do is use the 
structural properties of h to prove that certain quantities, such as Kx^iih{x)h{x+d)h{x+2d), 
are large, and then to show that f = g + h is a. "random enough" perturbation of h for 
^x,dfix)f{x + d)f{x + 2d) to be large as well. But even if /i is a small linear combination of 
just a few trigonometric functions, there is no particular reason for E,x^dh{x)h{x+d)h{x+2d) 
to be large. If we want it to be large, then we need additional assumptions. The most 
useful one in practice is positivity. 

Suppose that / G M" is a function with ||/||2 < 1 and that it takes non-negative values. 
With an appropriate choice of norm ||.||, Proposition 13.71 allows us to decompose / into a 
"structured part", a "quasirandom part" and a small L2 error. One's intuition suggests 
that the structured part of a non-negative function should not need to take negative values, 
and this turns out to be correct for the norms discussed in section 12. 3[ 

In section [5] we shall prove a very general result of this kind. In this section, we shall 
prove some simpler results that illustrate the method of polynomial approximations; we 
shall use this method repeatedly later. 

4.1. Algebra norms, polynomial approximation and a first transference theo- 
rem. 

To begin with, we need a definition that will pick out the class of norms for which we can 
prove results. Actually, for now we shall give a definition that is not always broad enough 
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to be useful. In the next section we shall define a broader class of norms to which the 
method still applies. 

Definition. Let X be a finite set. An algebra norm on M.^ is a norm \\.\\ such that 
ll/fi'll ^ 11/11 \\g\\ for any two functions f and g, and ||1|| = 1. 

A good example of an algebra norm — indeed, the central example — is the £i-norm of the 
Fourier transform of /, which has the submultiplicativity property because 

llSili = ll/*^lli< ll/llill^lli. 

The predual of this norm (it is of course the dual as well but we shall be thinking of it as 
the primary norm and the algebra norm as its dual) is the ioo norm of /, which, as we have 
already seen, is in a crude sense equivalent to the [/^-norm for many functions of interest. 
We shall use the following simple lemma repeatedly. 

Lemma 4.1. Let \\.\\ be a norm on M" such that the dual norm \\.\\* is an algebra norm. 
Then \\f\\ > |E^./(x)| and \\f\\* > \\f\\r^ for every function f. 

Proof. Since ||.||* is an algebra norm, ||1||* = 1, so ||/|| > |(/, 1)| = |E2;/(x)|. 

For the second part, if ||/||* < 1 then ||/"||* < 1 for every n. It follows that ||/||oo < 1, 
since otherwise at least one coordinate of /„ would be unbounded. Therefore, ||/||oo < ||/||* 
for every /. □ 

The Weierstrass approximation theorem tells us that every continuous function on a 
closed bounded interval can be uniformly approximated by polynomials. It will be helpful 
to define a function connected with this result. Given a real polynomial P, let i?p be the 
polynomial obtained from P by replacing all the coefficients of P by their absolute values. 
If J : M ^ M is a continuous function, C is a positive real number and 5 > 0, let p(C, 5, J) 
be twice the infimum of Rp{C) over all polynomials P such that \P{x) — J{x)\ < 5 for 
every x G [— C, C]. So that it will not be necessary to remember the definition of p(C, 5, J) 
we now state and prove a simple but very useful lemma. 

Lemma 4.2. Let ||.||* be an algebra norm, let J : ^ be a continuous function and letC 
and6 be positive real numbers. Then there exists a polynomial P such that ||P0 — J0||oo < ^ 
and ||P0||* < 5, J) for every G such that ||0||* < C. 

Proof. It is immediate from the definition of p(C, 5, J) that for every C and every 5 > 
there exists a polynomial P such that \P{x) — J{x)\ < 6 for every x G [— C, C], and such 
that Rp{C) < p{C,5, J). 
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Now let G M" be a function with ||0||* < C. Then ||0||oo < C as well, since ||.||* is an 
algebra norm. Since P and J agree to within 5 on [— C, C], it follows that ||P0 — J0||oo < ^■ 

Suppose that P is the polynomial P{x) = anx"' + ■ ■ ■ + aix + qq- Then, by the triangle 
inequality and the algebra property of ||.||*, 

\\P<P\\ < |a„| 11011* + --- + |ai| 11011* + |ao| 
< kn|(||0||*)" + --- + |ai|||0r + |ao| 

= RpiHin- 

Since the coefficients of Rp are all non-negative, this is at most Rp{C), which is at most 
p(C, 6, J), by our choice of P. □ 

In more qualitative terms, the above lemma tells us that if is bounded in an algebra 
norm and we compose it with an arbitrary continuous function J, then the resulting func- 
tion J0 can be uniformly approximated by functions that are still bounded in the algebra 
norm. 

The next result is our first transference theorem of the paper. It tells us that if fi and 
u are non-negative functions on a set X and they are sufficiently close in an appropriate 
norm, then any non- negative function that is dominated by fi can be "transferred to" — that 
is, approximated by — a non-negative function that is dominated by u. We shall apply this 
principle in Section 5. As we shall see later in this section, it is also not hard to generalize 
the result to obtain a generalized version of the Green- Tao transference theorem. 

Theorem 4.3. Let fi and v he non-negative functions on a set X and suppose that ||/i||i 
and ||z/||i are both at most 1. Let ri,6 > 0, let J : W ^ W be the function given by 
J{x) = (x + |x|)/2 and let e = 6/2p{'r]~^,6/A, J) . Let \\.\\ be a norm on such that the 
dual norm \\.\\* is an algebra norm and suppose that — z/|| < e. Then for every function 
f with < f < fi there exists a function g such that < g < v{l — 5)~^ and ||/ — g\\ < Tj. 

Proof. An equivalent way of stating the conclusion is that / = g + h with < (7 < v{l — 5)~^ 
and ||/i|| < Tj. Thus, if the result is false then we can find a functional such that (/, 0) > 1, 
but 0) < 1 for every g such that < (7 < v{l — 5)"^, and ||0||* < rj"^. 

The first condition on is equivalent to the statement that (z/, 0+) < 1 — 5. To see 
this, note that for any 0, the g that maximizes ((7, 0) takes the value when 0(x) < and 
u{x){l — 5)^^ when 0(x) > 0, in which case ((7,0) = (1 — 0+). 
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Now 0+ is equal to J (p. Since ||.||* is an algebra norm, we can apply Lemma [4.21 and 
obtain a polynomial P such that ||P0-0+||oo < 5/4 and ||P0||* < Rp{C) = p{r]~'^,6/A, J), 
which we shall abbreviate to p. 

Since < 1 - 5 and < 1, it follows that {u, Pep) < 1-35/4. Since ||P0||* < p 

and Wfi — uW < e, it follows that {fi,P(f)) < 1 — 35/4 + ep. Since \\p\\i < 1, it follows that 
(/i, 0+) < 1 — 5/2 + ep. Since f < pit follows that (/, 0+) < 1 — 5/2 + ep, and since / > 
it follows that (/, 0) < 1 — 5/2 + ep, which is a contradiction. □ 

4.2. Approximate duality and algebra-like structures. 

As the previous section shows, we can carry out polynomial-approximation arguments 
when we are looking at a norm {|.{| for which the dual norm ||.||* is an algebra norm. A 
key insight of Green and Tao (which has received less comment than other aspects of their 
proof) is that one can carry out polynomial- approximation arguments under hypotheses 
that are weaker in two respects: one can use pairs of norms that are not precisely dual to 
each other, and the norm that measures structure can have much weaker properties than 
those of an algebra norm. It is not hard to generalize the arguments in an appropriate 
way: the insight was to see that there were important situations in which one could obtain 
the weaker hypotheses even when the stronger ones were completely false. 

To see why this might be, think once again about the one algebra norm we have so far 
considered, namely ||/||oo- For bounded functions /, this is closely related (by Proposition 
12.71 and the remark after it) to ||/||4, which equals the ?7^-norm, so we can deduce facts 
related to the f/^-norm from the fact that ||/||i is an algebra norm. 

We can regard this argument as carrying out the following procedure. First, we establish 
an inverse theorem for the ?7^-norm: this is what we did in Proposition 12. 7[ We then note 
that the functions that we obtain in the inverse theorem, namely the characters, are closed 
under pointwise multiplication. And then we make the following observation. 

Lemma 4.4. Let X he a set of functions in C" that spans all of 'C"' , contains the constant 
function 1, and is closed under pointwise multiplication. Suppose also that ||0||oo < 1 for 
every function cj) & X . Then the norm {|.{| on defined by the formula 

k k 

11/11 = inf |A,| : A, . . . , /, e A, / = J] A,/,} 

i=l i=l 

is an algebra norm. 
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Proof. Suppose that / = Yli=i ^ifi ^'^'^ 9 — Yl\=i f^jdjy with all fi and gj in X. Then fg = 
Si=i X]j=i '^ifJ'jfi9j- Since X is closed under pointwise multiplication, each figj belongs to 
X. Moreover, Yl^=i Sj=i 1-^*1 l/^il ~ Yli=i l-^d X]j=i From this the submultiphcativity 
follows easily. The fact that ||1|| = 1 follows from the assumption that 1 G X and that all 
functions in X have Loo-norm at most 1. □ 

In the case where X is the set of all characters on a finite Abelian group, the norm given 
by Lemma 14.41 is the £i-norm of the Fourier transform. 

Now suppose that we want to prove comparable facts about the f/^-norm. An obvious 
approach would be to use Theorem 12.91 the inverse theorem for the [/^-norm. However, the 
generalized quadratic phase functions that appear in the conclusion of that theorem are not 
quite closed under pointwise multiplication: associated with them are certain parameters 
that one wants to be small, which obey rules such as 7(/5') < lif) + lio)- 

As we shall see, this is not a serious difficulty, because often one can restrict attention 
to products of a bounded number of functions that an inverse theorem provides. A more 
fundamental problem is that for the higher t/*^-norms we do not (yet) have an inverse 
theorem. Or at least, we do not have an inverse theorem where the function that appears 
in the conclusion can be explicitly described. What Green and Tao did to get round this 
difficulty was to define a class of functions that they called basic anti-uniform functions, 
and to prove a "soft" inverse theorem concerning those functions. 

Definition. For every function f G M", let Vf he the function defined by the formula 

Vf{x) = Ea,b,cf{x + a)f{x + b)f{x + c)f{x + a + b)f{x + a + c)/(x + b + c)f{x + a + b + c). 

Let X be a subset ofW^. A basic anti-uniform function (with respect to X) is a function 
of the form Vf with f E X . 

Needless to say, the above definition generalizes straightforwardly to a class of basic anti- 
uniform functions for the ?7*^-norm, for any k. The same applies to the next proposition. 

Proposition 4.5. Let X be a subset o/R" and let f E X be a function such that H/Ht/s > e. 
Then there is a basic anti-uniform function g, with respect to X, such that {f,g) > e^. 

Proof. The way we have stated the result is rather artificial, since the basic anti-uniform 
function in question is nothing other than Vf. Moreover, it is trivial that {f,Vf) > e^, 
since if we expand the left-hand side we obtain the formula for ||/||^3. □ 
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Of course, the price one pays for such a simple proof is that one has far less information 
about basic anti-uniform functions than one would have about something like a generalized 
polynomial phase function. In particular, it is not obvious what one can say about products 
of basic anti-uniform functions. 

We remark here that an inequality proved in [Glj implies easily that Vf) < \\g\\u'-^\\f\\lj 
for every function 5(. Thus, ||r'/||^3 < ||/||^3- Since {f,Vf) = ||/||,73||/||^3, we see that 1)/ 
is a support functional for /. It is not hard to show that it is unique (up to a scalar mul- 
tiple). Since every function is a support functional for something, it may seem as though 
there is something odd about the definition of a basic anti-uniform function. However, it 
is less all-encompassing than it seems, because we are restricting attention to functions Vf 
for which / belongs to some specified class of functions X. (Nevertheless, we shall usually 
choose X in such a way that every function is a multiple of a basic anti-uniform function.) 

A crucial fact that Green and Tao proved about basic anti-uniform functions is that, for 
suitable sets X, their products have (f/^')*-norms that can be controlled. To be precise, 
they proved the following lemma. (It is not stated as a lemma, but rather as the beginning 
step in the proof of their Lemma 6.3.) 

Lemma 4.6. For every positive integer K there is a constant Ck such that ifVfi, . . . , Vfx 
are basic anti-uniform functions [with respect to a suitable set X], then IjX'/i . . .VfxWljk < 
Ck- 

4.3. A generalization of the Green- Tao-Ziegler transference theorem. 

We shall be more concerned with the form of Lemma 14.61 than with the details of what X 
is, since our aim is to describe in an abstract way the important properties of the operator 
/ Vf. This we do in the next definition, which is meant to capture the idea that the 
dual of a certain norm somewhat resembles an algebra norm. 

Definition. Let \\.\\ be a norm on such that ||/||oo < ||/||* for every f G M", and let X 
be a bounded subset o/M". Then \\.\\ is a quasi algebra predual norm, or QAP-norm, with 
respect to X if there is a (non-linear) operator D : ^ M" a strictly decreasing function 
c : ]R+ ]R+, and an increasing function C : N ^ M with the following properties: 

(i) {f,Vf) < 1/or every feX; 

ill) {f,Vf) > c(e) for every feX with \\f\\ > e; 

(ill) . . . VfK\\* < C{K) for any functions /i, . . . , /^^ G X. 
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It will help to explain the terminology if we introduce another norm, which we shall 
call II-IIb^c*- It is given by the formula ||/||bac = hicix{| (/, Pgf) | : g G X}. The letters 
"BAG" stand for "basic anti-uniform correlation" here. We shall call the functions Vf 
with / G X basic anti-uniform functions, and assume for convenience that they span M", 
so that really is a norm. Of course, this norm depends on X and but we are 

suppressing the dependence in the notation. 

By Lemma [3.51 the dual of the norm ||.||bac is given by the formula 

k k 

Wfr^^c = inf{5Z |A,,| ■■f = Y. fi,---Jke X}. 

i=l 1=1 

Thus, it measures the ease with which a function can be decomposed into a linear combi- 
nation of basic anti-uniform functions. In terms of this norm, property (ii) above is telling 
us that if / G X and ||/|| > e then > c(e). This expresses a rough equivalence 

between the two norms, of a similar kind to the rough equivalence between ||/||[72 and 
||/||oo when ll/lloo < 1- It can also be thought of as a soft inverse theorem; property (iii) 
then tells us that the functions that we obtain from this inverse theorem have products 
that are not too big. 

Now let us briefly see why Theorem 14. 31 generalizes easily from preduals of algebra norms 
to QAP-norms. The following result is not quite the result alluded to in the title of this 
subsection, but it is an abstract principle that can be used as part of the proof the Green- 
Tao theorem. As we mentioned in the introduction, the proof given here is much shorter 
and simpler than the proof given by Green and Tao. (This is not quite trivial to verify as 
they do not explicitly state the result, but the proof here can be used to simplify Section 
6 of their paper slightly, and to replace Sections 7 and 8 completely.) 

As a first step, we shall generalize Lemma [4.21 the simple result about polynomial ap- 
proximations. The generalization is equally straightforward: the main difference is merely 
that we need a modificiation of the definition of the polynomial Rp. Let us suppose that 
||.|| is a QAP-norm and let C : N ^ R be the function given in property (iii) of that 
definition. If P is the polynomial p{x) = QnX^ + ■ ■ • + aix + Oq, then we define R'p to be the 
polynomial C(n)|a„|x" + - ■ ■ + C(l)|ai|x+|ao|: that is, we replace the /cth coefficient of P by 
its absolute value and multiply it by C(A;). If J : R ^ M is a continuous function, and Ci, 
6*2 and 5 are positive real numbers, we now define p'{Ci, C2, S, J) to be twice the infimum 
of R'p{C2) over all polynomials P such that \P{x) — J{x)\ < 6 for every x G [— Ci, Ci]. 
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Lemma 4.7. Let \\.\\ be a QAP-norm, let J : 'K ^ M be a continuous function and let 
Ci, C2 and 5 be positive real numbers, with Ci = C{1)C2- Then there exists a polynomial 
P such that ||P0 - J0||oo < S and ||P0||* < p'{Ci,C2,S, J) for every e M" such that 

mBAC<C2. 

Proof. It is immediate from the definition of p'(Ci, C2, 5, J) that for every Ci, C2 and 6 
there exists a polynomial P such that \P{x) — J{x)\ < 6 for every x G [— Ci, Ci], and such 
that R'p{C2) < p'{Ci,C2,S, J). 

Next, observe that if X is the set specified in the definition of QAP-norms, and / is a 
function in X, then ||^^/||* < C{1), by property (iii). Therefore, if G M" is a function 
with \\(j)\\*BAc < C2, it follows that ||0||* < C{1)C2 = Ci- Then < Ci as well, from 

the definition of QAP-norms. Since P and J agree to within 5 on [— Ci, Ci], it follows that 

||P0- J0IU<5. 

From the formula for Hf/'H^^^ and the fact that this is at most C2 it follows that for 
any e > we can write as a linear combination of basic anti-uniform functions, with the 
absolute values of the coefficients adding up to at most C2 + e. Therefore, for any e > we 
can write 0"* as a linear combination of products of m basic anti-uniform functions, with the 
absolute values of the coefficients adding up to at most + e. Each of these products has 
||.||*-norm at most C(m), by property (iii) of QAP-norms. Hence, by the triangle inequality, 
||0m||* ^ C(m)C™. More generally, if P is the polynomial P{x) = a^.x"^ -|- ■ ■ ■ -|- ciix -|- ag, 
then by the triangle inequality we obtain that 

\Wr < |a„|||0'^ir + --- + |ai|||0|r + |ao| 

< Cin)\an\C^ + ■■■ + C(l)|ai|C2 + |ao| 
= R'p{C2). 



As we remarked at the beginning of the proof, this is at most p'{Ci, C2, S, J), so the lemma 
is proved. □ 

Theorem 4.8. Let fi and v be non-negative functions on {1,2, . . . ,n} such that ||/i||i and 
||z/||i are both at most 1, and let 77, 5 > 0. Let \\.\\ be a QAP-norm on M", with respect to the 
set X of all functions / G M" such that |/(a;)| < max{/i(x), iy{x)} for every x. Let J : M ^ 
M be the function given by J(x) = (x+ |a;|)/2 and let e = 6/2p'{C{l)c{ri)~^, c{ri)~^, 6/4, J), 
where p' is defined as in the discussion just above. Suppose that \\p — < e. Then for 
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every function f with < / < /i there exists a function g such that < g < iy{l — 6)~^ 
and 11/ - 5-11 < 77. 

Proof. An equivalent way of stating the conclusion is that / = g + h with < (7 < v{l — 5)~^ 
and < Tj. Since such an h will belong to X, we know that a sufficient condition for 
\\h\\ to be at most 77 is that ||/i||bac is at most cirf) (since c is strictly decreasing). Thus, if 
the result is false, then we can find a functional such that (/, 0) > 1, but ((7,0) < 1 for 
every g such that < (7 < u{l — 5)"^, and ||0||ij^c < 0(77)"^. For the rest of the proof, we 
shall write C2 for 0(77)"^. 

As in the proof of Theorem 14.3^ the ffist condition on is equivalent to the statement 
that {u, 0+) < 1—5, and we still have that 0+ = J0. Also, Lemma Wl\ gives us a polynomial 
P such that ||P0 - J0||oo < 5/4 and ||P0||* < p' = p'{C{l)C2, C2, 5/4, J). 

Since (z/,0+) < 1-5 and \\v\\i < 1, it follows that (z/,P0) < 1-35/4. Since ||P0||* < p' 
and ll/i — z/|| < e, it follows that (/i, P0) < 1 — 35/4 + ep' . Since < 1, it follows that 
(yU, 0+) < 1 — 5/2 + ep'. Since / < yU it follows that (/, 0+) < 1 — 5/2 + ep', and since / > 
it follows that (/, 0) < 1 — 5/2 + ep', which is a contradiction. □ 

The abstract theorem stated and proved by Tao and Ziegler is both more and less general 
than Theorem 14.81 It is less general in that it takes v to be the uniform probability measure 
(and uses the letter v instead of p, so that the two measures are v and 1). But in a small 
way it is more general: they observe that we did not really need the full strength of the 
assumptions we made. 

4.4. Arithmetic progressions in the primes. 

In this section we shall briefly describe how a special case of Theorem 14.81 the second 
transference principle we proved earlier in the paper, was used by Green and Tao to prove 
that the primes contain arbitrarily long arithmetic progressions. 

The main idea of their proof is an ingenious way of getting round the difficulty that 
the primes less than do not form a dense subset of {1, 2, . . . , A^}. This sparseness 
problem occurs in several places in the literature, and there is a method by which one 
can sometimes deal with it, which is to exploit the fact that one has a lot of control over 
random (or random-like) sets. In particular, there are various results that assert that if 
X is a sparse random-like set and F is a subset of X that is dense in X (in the sense 
that |y|/|X| is bounded below by a positive constant) then Y behaves in a way that is 
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analogous to how a dense set would behave. That is, sparse sets can be handled if you can 
embed them densely into random-like sets. 

Green and Tao reasoned that an approach like this might work for the primes. There 
is a standard technicality to deal with first, which is that the primes are much denser in 
some arithmetic progressions than others. A moment's thought shows that this makes it 
impossible to embed the primes from 1 to densely into a quasirandom set. However, 
one can restrict to an arithmetic progression in which the primes are particularly dense 
(by looking at primes that are congruent to a mod m, where m is the product of the first 
few primes and a is coprime to m), in which this problem effectively disappears. 

To carry out their approach, they needed to do two things. First, they had to prove that 
there was indeed a quasirandom set containing the primes (inside a suitable arithmetic 
progression, but we'll use the word "primes" as a shorthand here) that was not much 
bigger than the primes. If they could do that, then the general principle that relatively 
dense subsets of quasirandom sets behave like dense sets would suggest that the primes 
should behave like a dense set. Since dense sets contain plenty of arithmetic progressions, 
so should the primes. The second stage of their proof was to make this heuristic argument 
rigorous. 

As it turns out, they did not construct a quasirandom superset of the primes, but an 
object that they called a pseudorandom measure. This was a non-negative function u that 
did not have to be 01-valued, but in other respects behaved like a superset of the primes. 
(In fact, they normalized it to have average 1, but even then it did not take just one 
non-zero value.) The construction of v was based on very recent (at the time) results of 
Goldston and Yildirim [GY] . This part of the proof belongs squarely in analytic number 
theory and we shall say no more about it here. 

The other part of the proof proceeded as follows. Let i/ be a pseudorandom measure: 
that is, a non-negative function defined on {1, 2, . . . , A^} such that ||z/||i = 1, which satisfied 
certain quasirandom properties. (These properties were similar to, but stronger than, the 
assertion that ||z/ — l||t/fc was very small.) Let us call a set A dense relative to v if there is 
a positive constant A such that \A < v and ||Ay4||i > c for some positive constant c that 
does not depend on A^. Since — l||f/fc is small, the transference principle of Theorem 14.81 
can be used to replace the function \A by a function / that takes values in [0, 1] and has 
the property that ||/ — Ay4||f;fc is small, provided, that is, that the hypotheses of Theorem 
14.81 are satisfied. 
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The programme for completing the proof is therefore clear: one must prove that the 
hypotheses are indeed satisfied, and one must prove that the fact that ||/ — AAH^/k is 
small allows us to conclude that A contains arithmetic progressions of length k + 2 (as one 
expects, since in other contexts the norm controls progressions of this length). 

Let us briefly recall what these hypotheses are. We define X to be the set of all functions 

that are bounded above in modulus by u + 1, and we would like the norm to be 

a QAP-norm with respect to X. (These were defined at the beginning of Section 14. 3[ ) 

Not surprisingly, as our non-linear operator V we take the operator defined just before 

Proposition 14.51 (for the appropriate k), except that for convenience we multiply it by 
2-{fc+i)_ 

The first hypothesis is that {f,Vf) < 1 for every f E X. It is straightforward to check 
from Green and Tao's definition of pseudorandomness that ||/||[/fe is at most 2'^ + o(l) for 
each / G X, and therefore this hypothesis is satisfied. 

The second is that (/, P/) > c(e) for every f E X with ||/||c/fe > e. But this is true 
because, with our definition of P, = 2-('=+i) (We have essentially given this 

argument already, in Proposition 14.51 ) 

The third is that products of basic anti-uniform functions have bounded (f/'^)* -norms. 
This is a lemma of Green and Tao that we stated as Lemma 14.61 It should be noted that 
to prove this they required quasirandomness hypotheses on v that are stronger than one 
might expect: in particular they needed more than just that v should be close to 1 in some 
f/^ norm. (The precise condition they needed is called the correlation condition in their 
paper.) It is not known whether there exists an r such that their transference theorem 
holds under the hypothesis that ||z/ — l||{/r is small. 

The one remaining ingredient of their argument is what they call a "generalized von 
Neumann theorem," in which they establish the fact mentioned above, that if ||/ — Ayl||f;fc 
is small then A contains arithmetic progressions of length k + 2. More precisely, 

\^^'^¥.^^dA{x)A{x + d)...A{x + {k + l)d) ^ E^,d/(x)/(x + d) . . . f{x + {k + l)d). 

If A is a dense set, so that A is bounded above by a constant independent of A^, then this 
is a standard result, but it is quite a bit harder to prove when all one knows about A is 
that XA is bounded above by a pseudorandom measure. 

5. Tao's structure theorem. 

In this section we shall combine some of the methods and results of previous sections in 
order to obtain a general structure theorem for bounded functions. This result resembles 
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Proposition 13.71 in that we decompose a function / as a sum /i + /2 + /s with ||/i||* < C, 
11/2 II < viC^) 11/3 II 2 ^ but this time we shall assume that / takes values in an 
interval [a, b] and deduce stronger properties of the functions ff. in particular, /i will 
also take values in the interval [a, b]. In order to do this, we shall need to use polynomial 
approximations. It would be possible to prove a result about QAP-norms, but the notation 
is simpler if we assume the stronger hypothesis that the dual norm ||.||* is an algebra norm. 
As we shall see, this result is general enough to apply in many interesting situations. 

Here, then, is the structure theorem we shall prove in this section. Tao's structure the- 
orem is essentially the same result, but for a specific sequence of algebra norms. However, 
his method can easily be modified to prove this more general formulation. (In other words, 
the point of this section is the method of proof rather than the extra generality of the 
conclusion.) We should mention here that there are other results of a similar flavour to 
Tao's, which are often referred to as "arithmetic regularity lemmas". The following result 
can be thought of as an abstract arithmetic regularity lemma. 

Theorem 5.1. Let \\.\\ be a norm defined on M", and suppose that the dual norm \\.\\* is 
an algebra norm. Let / G &e a function that takes values in the interval [a,b]. Let 
7] : M+ M+ be a positive decreasing function and let e > 0. Then there is a constant 
Co, depending on t] and e only, such that f can be written as a sum /i + /2 + /s, with 
II /ill* < Cq, 11/2 II < "/ydl/ill*); and 11/3 II 2 < e. Moreover, fi and fi + /a both take values in 



The last condition may look slightly strange, but it is important in applications. For 
instance, for Tao's application to Szemeredi's theorem, [a, b] is the interval [0, 1], and /i is 
the "structured part" of /. The key step in his argument is that E^-Zi > 6 implies that 
^x,dfi{^)fi{x + d) . . . fi{x + {k — l)d) > c{6) > 0, and more generally that the same is true 
of fi + fs'- that is, after a small L2-perturbation of the function fi. However, c{6) is much 
smaller than 6; as a result, it is crucial that both fi and /i + /s should be positive, so that 
c{5) is not swamped by a negative error term. 



There is a simple way of making Theorem 15.11 more general, and this is very important 
for some applications, including Tao's application to Szemeredi's theorem. In order to 
explain the generalization, it will be convenient to introduce another definition. 

Definition. Let \\.\\ and \.\* be two norms on M" and let c : (0,1] (0,1] be a strictly 
increasing function. Then \.\* is an approximate dual (at rate c) for ||.|| if the following 
two conditions hold: 



[a, b] . 
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(i) (/, 0) < 11/11101* for any two functions f and in M"; 

r^^y' ^/ ll/iloo < 1 and 11/11 > e then there exists G swc/i that |0|* < 1 anc? (/, 0) > 
c(e). (Equivalently, |/| > c(e), where \.\ is the predual of\.\*.) 

The first of tfiese estimates is equivalent to tlie assertion tliat ||0||* < |0|* for every 
G M". The second is equivalent to the assertion that if ||/||oo < 1, then |/| > c(||/||). 
Therefore, if a norm {|.|| merely has an approximate dual |.|* that is an algebra norm, we 
can apply Theorem 15.11 to the norm |.| and conclude that |/i|* < Co, II/2II < c~"'^(?7(|/i|*)) 
and 11/3 II < e. Since 77 can be chosen to tend to zero arbitrarily fast, so can o rj. Thus, 
Theorem 15.11 has the following immediate corollary. 

Corollary 5.2. Let \\.\\ and \ .\ be two norms on M" and suppose that \ .\* is an approximate 
dual for \\.\\. Let f be a function that takes values in an interval [a,b]. Let f] : M+ M+ 
be a positive decreasing function and let e > 0. Then there is a constant Cq, depending 
only on rj, e and the function c that appears in the specification of the approximate duality, 
such that f can be written as a sum /i + /2 + /s, with \fi\* < Cq, II/2II < ''7(||/i||*); and 
II/3II2 < Moreover, fi and fi + /s both take values in [a,b]. 

5.1. A proof of the structure theorem. 

The proof we shall give in this paper is quite different from that of Tao. The main idea is to 
start with a decomposition obtained using Proposition 13.71 (which was an easy consequence 
of the Hahn-Banach theorem) and to adjust it until the functions /i and /i + /s have the 
right ranges. During the process of adjustment, we shall have cause to use Theorem 
14. 3[ the first of the transference theorems obtained in the previous section. The proof is 
conceptually very simple, but it involves a longish sequence of small calculations to check 
that the errors that we introduce when we adjust our decomposition are small. 

To begin with, then, let ^ be a decreasing positive function and (3 a positive constant, 
both to be specified later, and apply Proposition 13.71 to write / as /i + /2 + /s with 
II /ill* = K, 11/2 II < 9{K) and II/3II < /5. Here, K is bounded above by a function of 9 and 
/3, so later we shall need 6 and (3 to depend only on 77 and e. 

We would now like to modify /i so that it takes values in the interval [a, b]. The obvious 
way of doing this is to apply Lemma 14.21 with the continuous function J that takes the 
value a when x < a, b when x > b and x when x G [a, b]. This gives us a new function Pfi 
such that \\Pfi — J/illoo < 5 and ||P/i||* < p = p{K,6,J). The first inequality implies 
that Pfi takes values in [a — 6,b + 6], and a small adjustment will correct that to [a,b]. 
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However, before we do the adjustment, let us check that /i — -P/i is small in an appropriate 
sense. Intuitively, this is plausible: it should not be possible for the structured part of / 
to stray too far from the interval [a, b] for too long. This intuition turns out to be correct, 
and proving it rigorously is not very hard. 

Lemma 5.3. Let fi and Pfi be the functions just defined. Then provided that the function 
9 is sufficiently small (in terms of a, b and (3), we have the inequality — P/1II2 < 3/3/2. 

Proof. From the decomposition / = /i + /2 + /s we obtain the decomposition 

/l - Pfl = (/ - Pfl) - /2 - /3 

We shall now bound — -P/1II2 by looking at the inner products of fi — Pfi with each 
of the three terms on the right-hand side. 

First of all, if /i(x) > b then J/i(x) = 6, so |P/i(x) — b\ < 5, and therefore — 
Pfi{x) > Since / takes values in [a, 6], we also find that f{x) — Pfi{x) < 6. Similarly, 
if < a then we find that — Pfi{x) < 5 and /(x) — P/i(x) > —5. If G [a, 6], 

then = J/i(x), so -P/i(x)| < 5, and -P/i(x)| <{b-a + 26).li follows 

from these three estimates that (/i — Pfi, f — Pfi) < 45^ + 6{b — a). 

Since ||/i||* < K, \\Pfi\\* < p, and H/sH < e{K), it follows that 

\{fi-Pfij2)\<{K + p)e{K). 
For the third inner product we use Cauchy-Schwarz to give a trivial implicit estimate: 

|(/l-P/l,/3)| </3||/l-P/l||2. 

From the estimates for these inner products it follows that 

ll/i - PfiWl < 45^ + S{b -a) + {K + p)e{K) + - Pfi^. 

Therefore, if we choose 5 such that 45^ + 5{b — a) < l3'^/4: and 6 in such a way that 
{K + p{K, 6, J))e{K) < (3^2 for every K, then 

ll/i - PfiWl < /3V4 + + PWfi - P/1II2, 

from which it follows, on completing the square, that — P/1II2 < 3/5/2, as claimed. 
To complete the proof, note that the condition on 6 depends on p, and hence on S, and 6 
depends on a, b and p. □ 

The next step is very simple. Let L be the linear function that takes a — 5 to a and b + S 
to b. Then \L{x) — x\ is at most 6 for every x in the interval [a — 6,b + 6]. Since fi takes 
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values in this interval, it follows that \\LPfi — Pfi ||oo < Also, if we write L{x) = Xx + fi, 
it is easy to see that < A < 1, from which it follows that ||LP/i||* < ||P/i||* + fi (since 
|1|* = 1). A small calculation shows that fi = —{a + b)6/{a — b — 26), so ||LP/i||* < 2p, 
provided 6 is moderately small (depending on a and b). 

Let us now see where we have reached. We started with a decomposition / = fi + f2 + fs, 
and we have now modified /i, first to Pfi and then to LPfi. The first modification incurred 
an extra error of L2-norm at most 3/5/2, and the second an extra error of Loo-norm, and 
hence L2-norm, at most 6. If we assume that 6 < (3/2 then we find that we have a 
decomposition f = gi + g2 + g^, where gi = LPfi, g2 = /2, and 92, = h + (/i - LPfi). We 
have shown that ||^i||* < 2p, that \\g2\\ = II/2II < 0{K), and that Wg^y < (3 + 3f]/2 + (3/2 = 
3/5. Moreover, gi takes values in the interval [a,b]. 

This gives us most of what we want (if we choose (3 and 6 appropriately). The main 
thing we are missing is any information about the range of gi + g^. In order to obtain the 
extra property that gi + g^ takes values in [a, b], we shall focus on the equivalent problem 
of ensuring that f{x) — b < g2{x) < f{x) — a for every x. 

Note that f{x) — b < and f{x) — a > for every x. Our strategy for obtaining these 
bounds on g2 is even simpler than our strategy for adjusting /i earlier: we shall replace 
g2ix) by f{x) — a whenever (72 (x) > f{x) — a, and similarly on the other side. However, if 
that is all we do then we lose all information about ||5'2||- This is where Theorem 14.31 our 
first transference theorem, comes in: when we adjust the positive part of g2 we can use 
Theorem 14.31 to make a complementary adjustment to the negative part, and vice versa. 

Let us therefore set g2{x) to be mm{g2{x) , f{x) — a} for each x. First we need a simple 
lemma. 

Lemma 5.4. If g2 = min{(72, / — then \\g2 — g2\\2 < 3/?. 

Proof. For every x, either g2{x) — g2{x) = or 

< g2ix) - g2ix) = g2ix) - f{x) + a = a- gi{x) - g^ix) < -gsix), 

where the last inequality follows from the fact that gi{x) G [a,b] for every x. It follows 
that \\g2 — (72112 ^ lls'alh) which we have established to be at most 3(3. □ 

Our first attempt at adjusting the decomposition is to write 

f = 91 +92 + {93 + 92-92)- 

Our main problem now is that we do not have a good estimate for ||5'2||. To deal with 
this, we shall adjust the negative part of 92 as well, using Theorem 14. 3[ Let fi = {92)+ 
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and let v = ((72)-- Then fi and u are disjointly supported, so both ||/i||i and are at 
most 11(72111, which is at most ||5'2||2- Since g2 = {f — Qi) + Q-i and / and gi take values 
in [a, 6], \\g2\\ < |& — a| + 3/?, by the triangle inequality and our estimate for ||5'3||2- Let 
a = \h — a\+ 3/3. 

We now apply Theorem 14 . 3 1 wit h /i and v as above and with / = {g'2) + - Strictly speaking, 
this is not quite accurate, since the upper bounds for ||/i||i and ||z/||i are a rather than 1, 
but we can look at the functions a~^fi, a~^v and f instead. The main hypothesis 
we have is that \\g2\\ = — < d{K)i so we can take e to be a~^d{K) in Theorem 
14.31 If r > is a constant such that a~^9{K) = 6/2p{aT~^ , /3/4, J) (where now J{x) 
is the function (x + |x|)/2), then we may conclude that there is a function g such that 
< g < ^{1 — and ||/ — g\\ < t. The important thing to note here is that r tends to 
zero as 0{K) tends to zero. 

Define g2 to be / — g. This gives us a decomposition 

f = gi + g'l + [gz + (^?2 - g2) + {92 - a'Dl 

We have the upper bounds g2{x) < f{x) — a for every x, and \\g2\\ < t. However, we have 
not yet checked that ||5'2 ~ 5'2 II2 is small. For this we need another simple lemma. 

Lemma 5.5. Let z/ G M" be a non-negative function and suppose that v can he written as 
a sum Ui + U2, where ||z^i||oo ^ ol and \\i'2\\2 ^ 7- Then \\h\\2 < 7 + for any 

function h with < h < u. 

Proof. By the positivity of h and u, 

\\h\\l<{h,iy, + U2)<a\\h\\, + ^\\h\\2. 

The bound stated is an easy consequence of this. □ 

Corollary 5.6. Let g" be any function such that g"{x) = g'2{x) when g2{x) is non-negative, 
and > g"{x) > g2{x) otherwise. Suppose also that \\g2\\ < t- Then \\g" — g2\\2 < 
3/3 + (a(r + 3/3))^^ 

Proof. It follows from the hypotheses that < g"{x) — g2{x) < v{x) for every x. Recall 
that g2 = if — gi)+g3 and that ||/ — (^iHoo < b — a. It follows easily that u = ((72)- satisfies 
the conditions of Lemma 15. 5[ with 7 = 3/3. (We could improve a to b — a, but this is not 
worth bothering about.) 

Applying the lemma, we deduce that US'" — (72112 < 3/3 + (a\\g" — g2\\iY^'^. Now let us 
turn our attention to bounding \\g" — (72II1. Since ||/ — (7|| < t and ||.||* is an algebra 
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norm, it follows from Lemma l4.ll that |E^(/(x) — g{x))\ < r. But |E^.(/(x) — g{x))\ > 
Il5'"-fl'2lli - \\92-g2\\i, since g" > g'2, and ||fi'2 -5'2lli < llfl'2 - 5'2ll2, which we have already 
shown is at most 3/3. Therefore, \\g" — g'2\\i < t + 3/9. Inserting this bound into the 
estimate at the beginning of this paragraph, we find that US'" — fi'2||2 < 3/3+ (Q;(r + 3/3))^/^, 
as claimed. □ 

Since the function constructed earlier satisfies the hypotheses required of g" in Corol- 
lary 15.61 we now have an improved decomposition / = hi + h2 + h-^, where hi = gi, 
/i2 = 5'2 ^^"i hs = gs + g2 — g'-^- Our arguments so far have shown that hi takes values 
in [a,b], that \\hi\\* < 2p, that ||/i2|| < t, that h2{x) < f{x) — a for every x, and that 
ll^slb < Q/3 + 3/3 + (a(r + 3f3)y^'^. If /3 is sufficiently small (depending on 6 — a if that is 
small, which in a typical application it will not be), and r is sufficiently small (depending 
on P), then this last quantity is at most 2(/3(6 — a))^^^, which we shall call C- 

From the way we constructed /12, we know that the sign of /12 is the same as that of (72, 
and that |/i2(3;)| < 1^72(3;) | for every x. We have obtained the upper bound of / — a that 
we wanted for /i2; now we need a further adjustment in order to obtain a lower bound of 
f — b. It is obvious how to do this: we shall sketch the argument only very briefly. 

First, we let h2{x) = max{/i2(a;), /(x) — b} for every x. Then a simple modiflcation of 
Lemma [5.41 shows that ||/i2 — /i2ll2 ^ 3^- 

Next, we use Theorem 14.31 to reduce the positive part of h'2, while leaving the negative 
part unchanged, to create a function /ig with \\h2\\ small. If we let a' = b — a + 3(, then the 
same argument as before gives us an upper bound ||/i2|| < n, where k is a constant such 
that a'~^T = 6/2p{a'K~^, C/4, J). In particular, k tends to zero as r tends to zero. 

Next, a simple modiflcation of Corollary 15. 6l tells us that ||/i2 — /^^h ^ 3C + (a'(K+3C))^/^. 
Therefore, we have a decomposition / = ui + U2 + u^, with ui = hi, U2 = /i2 and ^3 = 
hs + h2 — /i2- Since ui = hi, it takes values in [a,b]. The construction of /ig guarantees 
that f{x) — b < U2{x) < f{x) — a, and hence that ui + ^3 takes values in [a, b]. Finally, we 
have the estimates ||mi||* < 2p, ||m2|| < k, and IIM3II2 < 6C + 3C + («'(«; + 3^))"^''^. If C is 
small enough (depending on 6 — a) and k is small enough (depending on Q , then this last 
quantity is at most 2 ((6 — a)C)^^^. 

Now let us see why these estimates are enough, recalling from the beginning of the proof 
that we are free to choose (3 and Q. To begin with, we need 2{(b — a)C)^/^ to be at most e. 
But C tends to zero with (3, so this is easily achieved. Next, recall that p = p{K, 5, J). We 
would like k to be at most ri{p), which we shall ensure by making a suitable choice of 9. 
The constant 5 depends on /?, a and b only, while n tends to zero with r, which tends to 
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zero with 9{K). Thus, for each K we can choose 9{K) in a way that depends on K, (3, a 
and b only, such that k < rj{p). The proof is complete. 

5.2. Decomposition theorems with bounds on ranges. 

As a simple application of Theorem 15.11 we shall now obtain the improvement that we 
promised earlier to our results about deducing decomposition theorems from inverse theo- 
rems. So far, we have shown that a function can be decomposed into a multiple of a convex 
combination of structured functions, plus an error, provided that we have a suitable in- 
verse theorem concerning the structured functions and the kind of error we are prepared 
to allow. As we commented, it is sometimes useful to obtain a decomposition for which 
the "structured part" is bounded. We shall see that Theorem 15.11 implies rather easily 
that such a decomposition exists, and it has the added advantage of yielding an L2 error 
term rather than the Li error term that appears in Theorem 13.81 or weaker theorems of a 
similar type that were discussed earlier in Section However, the bound on the sum of 
the coefficients of the structured functions is very bad. For some applications, this is not 
a concern, but for others it turns out to be preferable to use weaker theorems. 

Theorem 5.7. Let ||.|| he a norm on M" and let $ C 6e a set of functions satisfying 
the following properties for some strictly increasing function c : (0, 1] (0, 1].' 

(i) ^ contains the constant function 1, $ = — ||0||oo < 1 for every G and the 
linear span of ^ is M"; 

(a) (/, 0) < 1 for every f with \\f\\ < 1 and every cf) G $; 

(Hi) if ll/lloo < 1 o-nd ||/|| > e then there exists G $ such that {f,4>) > c(e). 

Let e > and let t] : M+ M+ be a strictly decreasing function. Then there is a constant 
Mq, depending only on e and the functions c and rj, such that every function / G M" that 
takes values in [0, 1] can be decomposed as a sum /i + /2 + /s, with the following properties: 
fi and fi + /a take values in [0, 1]; fi is of the form Xiil'i, where |A| = M < Mq and 
each ipi is a product of functions in ^; II/2II < ri[M); II/3II2 < e- 

Proof. Let be the set of all products of functions in $. Define a norm |.|* by taking \g\* 
to be the infimum of all sums |Aj| such that g can be written as Xiipi with every ipi 
in It is straightforward to check that this is an algebra norm. (The fact that it is a 
norm rather than a seminorm relies on the boundedness of functions in $, which one could 
in fact deduce from (ii) rather than stating as a separate assumption.) Moreover, (ii) and 
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(iii) imply easily that |.|* is an approximate dual for ||.||. Therefore, Corollary 15.21 implies 




The following simple trick is important in applications. Property (iii) in the statement 
of the theorem is the assertion that there is an inverse theorem relating the norm {|.|| 
to the set of functions in $. However, the conclusion of the theorem concerns products of 
functions in $, and in practice it often happens that the set $ of functions that one obtains 
from an inverse theorem is not closed under pointwise multiplication. However, it also 
often happens that one can give an explicit description of products of functions in $, and 
that this description becomes only gradually less useful as the number of functions in the 
product increases. Under such circumstances, one can replace $ by the set {1, — l}U(<l'/2). 
This modified set clearly satisfies all the hypotheses that $ was required to satisfy, but 
now the corresponding set \1/ comes with a "penalty" of 2~*^ attached to a product of K 
functions. This means that the sum of the |Aj| over products of significantly more than 
loggMo functions in $ make a very small (in L^o) contribution to /i and can be absorbed 
into the error term. 

5.3. Applying Tao's structure theorem 

We shall not actually give applications of the structure theorem here, but merely comment 
on how it is applied. The rough idea, as we have already seen, is to express a bounded 
function (such as, for instance, the characteristic function of a dense subset of Zn) as a 
sum of a structured part, a quasirandom part, and an L2 error. To do this, we need to 
choose a norm ||.|| that measures quasirandomness in a useful way, such that its dual norm 
||.||* is an algebra norm with the property that if \\4>\\* is bounded then we "understand" 
(j) and can regard it as structured. As we have seen, a simple (but useful) example of such 
a norm is ||/|| = ||/||oo. 

Let us briefly consider this example. If we have written a function / as /i + /2 + /s in 
such a way that ||/||i < C, \\f\\2 < v{C) and II/3II2 < e, then we can analyse it as follows. 

We first show that /i is "approximately smooth" in the following sense. Let 6,6 > 
be small constants to be chosen later, and let K be the set of all r such that |/i(r)| > 6. 
Since ||/i||^ = ||/i||^ < 1, it follows that \K\ < Now let B be the set of all x E Zn 
such that \uj^^ — 1| < 6' for every r E K. Sets like B are called Bohr neighbourhoods and 
have many good properties, but for now we remark merely that a fairly straightforward 
argument shows that the cardinality of B is at least O^^^N . 




the result. 



□ 
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Now let (3 be the characteristic measure of B: that is, the function that takes the value 
N/\B\ on B and elsewhere. This multiple of B is chosen so that = 1. A useful 

property of (3 is that /i is close to fi* j3 in L2. This can be shown with the help of Fourier 
transforms: the general method is known as Bogolyubov's method, and it is a very useful 
tool in additive combinatorics. We begin by observing that 

ii/i-/i*/5ii2 = wh-mi 

For every r ^ K we have |1 — /3(r)| = |Ea.£s(l — a;™)| < ^, so the first sum is at most 
^'^ll/lli ^ We also have the trivial estimate that |1 — /3(r)| < 2, so the second sum is at 
most 4(5||/||i < 4:6C. Thus, by choosing S and 9 appropriately, we can ensure that /i and 
fi* P are close in L2, as claimed. 

This tells us that for a typical pair x and y, if x — y & B, then and are close. 
Equivalently, /i is almost always roughly constant on translates of B. 

Now if we choose f]{C) to be small enough, then /2 is highly quasirandom even compared 
with the size of B. That is, II/2II00 is so small that even the restrictions of /2 to translates 
of B behave quasirandomly (in a sense that one can make precise in several natural ways). 
This means that even though /2 may have a large L2 norm, we may nevertheless think 
of /i + /2 as a tiny perturbation of fi. For instance, if a x is a typical element of Zjv 
and fi{x) > c, then the smoothness of /i guarantees that fi{y) > c/2 for almost every 
y E X + B. From this and the positivity of /i it follows (if B satisfies a certain technical 
condition that one can always ensure) that 

E.,d/i(x)/i(x + rf)/i(x + 2c/) 

is bounded below by some (very small) positive constant related to the density of B, which 
depended on C only. If ri{C) is much smaller than this constant, then perturbing by /2 
cannot change this lower bound to zero. 

This is not quite a sketch proof of Roth's theorem (though it is close), because there 
remains the problem of dealing with f^. In fact, the correct order to work in is to think 
about /i first, then /i + /s, and finally /i + /2 + /s- This is why it is so helpful for /i and 
/i + fi to be non-negative functions. 

The above idea can be thought of as a discrete analogue of at least one ergodic-theoretic 
proof of Roth's theorem. Tao applied his structure theorem to a sequence of cleverly 
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constructed algebra norms in order to extend the argument to a proof of the general case 
of Szemeredi's theorem. Unfortunately, the analysis of the structured function fi becomes 
far harder: that is where the real difficulty of his argument lies. 

The structure theorem can also be used to replace arguments that use Szemeredi's reg- 
ularity lemma. This is not too surprising, as in both cases the strength of the result comes 
from the fact that the bound on the quasirandomness can be made so small that it is even 
small compared with the "natural scale" of the structured part. Similarly, it can be used 
to replace a version of Szemeredi's regularity lemma, due to Green [H], that concerns dense 
subsets of finite Abehan groups. 
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